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Abstract 

When we cut an i.i.d. sequence of letters into words according to an independent renewal 
process, we obtain an i.i.d. sequence of words. In the annealed large deviation principle (LDP) 
for the empirical process of words, the rate function is the specific relative entropy of the 
observed law of words w.r.t. the reference law of words. In the present paper we consider the 
quenched LDP, i.e., we condition on a typical letter sequence. We focus on the case where the 

Mh ' renewal process has an algebraic tail. The rate function turns out to be a sum of two terms, one 

r^ . being the annealed rate function, the other being proportional to the specific relative entropy 

"t^ ' of the observed law of letters w.r.t. the reference law of letters, with the former being obtained 

by concatenating the words and randomising the location of the origin. The proportionality 
constant equals the tail exponent of the renewal process. Earlier work by Birkner considered 
the case where the renewal process has an exponential tail, in which case the rate function turns 

Cn ' out to be the first term on the set where the second term vanishes and to be infinite elsewhere. 

In a companion paper the annealed and the quenched LDP are applied to the collision local time 
of transient random walks, and the existence of an intermediate phase for a class of interacting 

\^ . stochastic systems is established. 
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1 Introduction and main results 



1.1 Problem setting 

Let £' be a finite set of letters. Let E = Un&nE"' be the set of finite words drawn from E. Botli 
E and E are Polish spaces under the discrete topology. Let 'P{E^) and V{E^) denote the set 
of probability measures on sequences drawn from E, respectively, E, equipped with the topology 
of weak convergence. Write 9 and 6 for the left-shift acting on E , respectively, E . Write 
-pinv(^N)^-pcrg^£;N) ^^^^^ pinv(^N)^ pcrg(^N) f^^. ^j^g gg^. ^f probability measures that are invariant 

and ergodic under 6, respectively, 9. 

For I' G 'P{E), let X = (Xj)jgpj be i.i.d. with law u. Without loss of generality we will assume 
that supp(i^) = E (otherwise we replace E by supp(i/)). For p € V{N), let r = (rj)jgN be i.i.d. with 
law p having infinite support and satisfying the algebraic tail property 



lim 



logp(n) 



p{n)>0 =■ 



-Q, 



a G (l,oo) 



(1.1) 



(No regularity assumption will be necessary for supp(/9).) Assume that X and r are independent 
and write P to denote their joint law. Cut words out of X according to r, i.e., put (see Figure 1) 



To := and Tj := Tj.i + Tj, i e N, 



:i.2) 



and let 



y»:=(XT,_,+i,XT,_,+2,...,XT0, iGN. (1.3) 

Then, under the law F, Y = (Y^^')i^jq is an i.i.d. sequence of words with marginal law qp^,^ on E 



given by 



qp,iy{{xi,...,Xn)) :=P(y^^^ = (Xi,... ,Xn)) = p{n) u{xi) ■ ■ ■ ^{Xn), 

n G N, xi, . . . , j;„ G E'. 



:i.4) 




Figure 1: Cutting words from a letter sequence according to a renewal process. 

For iV G N, let (F^^^, . . . , y(^))P<^f stand for the periodic extension of (F^^^ . . . , Y'^^^) to an 
element of E^ , and define 



<mv / tjiNn 



1 ^"^ 

i=0 



the empirical process of N -tuples of words. By the ergodic theorem, we have 

w- lim Rn = qf^ P-a.s., 



(1.5) 



:i.6) 



with w — lim denoting the weak limit. The following large deviation principle (LDP) is standard 
(see e.g. Dembo and Zeitouni [5], Corollaries 6.5.15 and 6.5.17). For Q G pi°^(EN) let 



1 



H{Q I O := lim„ TfHQl.. 1 (C')|.„ ^ [« 



,(X>Nn 



,oo 



(1.7) 



be the specific relative entropy of Q w.r.t. qf^, where J^n = a(Y^^> , . . . , y(^)) is the sigma-algebra 
generated by the first A'' words, Q|^ is the restriction of Q to ^n, and /i( • | • ) denotes relative 
entropy. (For general properties of entropy, see Walters [l3]. Chapter 4.) 

Theorem 1.1. [Annealed LDP] The family of probability distributions P(i?Ar G •), A^ € N, 

satisfies the LDP on P''^^(^^) with rate N and with rate function I^""^ : V'^^lE^) -^ [0,oo] given 
by 

I--^{Q) = H{Q I qf^). (1.8) 

This rate function is lower semi- continuous, has compact level sets, has a unique zero at Q = qf]], 
and is affine. 

The LDP for R^ arises from the LDP for A'^-tuples via a projective limit theorem. The ratio 
under the limit in ()1.7p is the rate function for A^-tuples according to Sanov's theorem (see e.g. den 
Hollander [8j, Section II. 5), and is non-decreasing in A^. 

1.2 Main theorems 

Our aim in the present paper is to derive the LDP for P(i?Af € • | X), A^ € N. To state our result, 
we need some more notation. 

Let K : E -^ E denote the concatenation map that glues a sequence of words into a sequence 
of letters. For Q G pi^v^^N) ^^^-^ ^j^^^ 

mg :=EQ[ri] <oo, (L9) 

define *q G p'^^^(E^) as 

(1.10) 



^,(.) :^ -i-E, 



Vi-l 

.fc=0 

Think of ^q as the shift- invariant version of the concatenation of Y under the law Q obtained after 
randomising the location of the origin. 

For tr G N, let [•]tT'- E -^ [E]tr '■= U^^^iE"^ denote the word length truncation map defined by 

y = {xi,...,Xn) ^[y]ti ■■= {xi,...,XnAtr), n e N, Xi, . . . , Xn & E. (1.11) 

Extend this to a map from E to [E]^^. via 

[(y«,y(2),...)]^^:=([y«K„[y(2)K„...) (1.12) 

and to a map from pi'^^(^^) to P''^^([-E]S) via 

[Q]tr{A):=Q{{z(£E^: [z]treA}), A C [-E]f, measurable. (1.13) 

Note that if Q G V'''^{E^), then [Q]tr is an element of the set 

pmv,fin(^N) = {Q G V'^'^iE^) : ruQ < oo}. (1.14) 

Theorem 1.2. [Quenched LDP] Assume (|1.1|) . Then, for u^^-a.s. allX, the family of (regular) 
conditional probability distributions P{Rn G • | X), N G N, satisfies the LDP on P™^(£)^) with 
rate N and with deterministic rate function I^^'^: 'p^^^(^E^) —f [0, cxo] given by 

( L^'^iQ), i/QGP'°^'fi°(-E^), 

/q"«(Q) ■= } wr 1 N (1-15) 

lim I [[Q\tr), otherwise, 

\ tr^oo 

where 

/fi°(Q) := H{Q I O + (a - 1) niQ H{^q \ u^y (1.16) 



Theorem 1.3. The rate function /^^^° is lower semi- continuous, has compact level sets, has a 
unique zero at Q = qf]], and is affine. Moreover, it is equal to the lower semi- continuous extension 

Theorem 11.21 wUl be proved in Sections [SHSl Theorem 11.31 in Section [H 

A remarkable aspect of (|1.16p in relation to (jl.Sp is that it quantifies the difference between the 
quenched and the annealed rate function. Note the appearance of the tail exponent a. We have 
not been able to find a simple formula for I^^^'^{Q) when niQ = oo. In Appendix!^ we will show 
that the annealed and the quenched rate function are continuous under truncation of word lengths, 
i.e., 

/^"■^(Q) = lim /"°°([Q]tr), /''"'(Q) = lim /^"'=([Q]tr), Q G V'^'^E''). (1.17) 

tr-^oo tr— >oo 

Theorem 11.21 is an extension of Birkner |,2j. Theorem 1. In that paper, the quenched LDP is 
derived under the assumption that the law p satisfies the exponential tail property 

3C<oo, A>0: /9(n)<C7e~^" Vn G N (1.18) 

(which includes the case where supp(p) is finite). The rate function governing the LDP is given by 

where 

^^ := S.Q e V'^'^E^): w-^im -^ ^ S^^^^y) = u^^ Q - a.s.l. (1.20) 

Think of ^ly as the set of those Q's for which the concatenation of words has the same statistical 
properties as the letter sequence X. This set is not closed in the weak topology: its closure is 

We can include the cases where p satisfies (jl.ip with a = 1 or a = oo. 



Theorem 1.4. (a) If a = 1, then the quenched LDP holds with I'^"'^ = I'^^^ given by (jl.8 
(b) If a = oo, then the quenched LDP holds with rate function 




z/ limm^Q^^M^^Q^. I -^^^ 



/q^<=(Q) = <; -■^^■''' tr^oo ^-j" ^ ^^^" ' ' (1.21) 

otherwise. 

Theorem 11.41 will be proved in Section [71 Part (a) says that the quenched and the annealed rate 
function are identical when a = 1. Part (b) says that (11.190 can be viewed as the limiting case of 
()1.16p as a ^ oo. Indeed, it was shown in Birkner j2]. Lemma 2, that on 'P"^^'fi'^(ii'f^): 

^Q = v®^^ if and only ff Q e ^y. (1.22) 

Hence, ()1.2ip and (|1.19p agree on "pm^jfin^^N-j^ ^^^^^ ^j^g t?A,q function (jl.2ip is the lower semicon- 
tinuous extension of (jl.lOp to V^'^^iP^). By Birkner [2], Lemma 7, the expressions in (jl.2ip and 
(|1.19p are identical if p has exponentially decaying tails. In this sense, Part (b) generalises the 
result in Birkner [2], Theorem 1, to arbitrary p with a tail that decays faster than algebraic. 

Let vTi : S^ — > i? be the projection onto the first word, and let T'{E) be the set of probability 
measures on E. An application of the contraction principle to Theorem 11.21 yields the following. 



Corollary 1.5. Under the assumptions of Theorem \1.2[ for u^^-a.s. all X, the family of (regular) 
conditional probability distributions ¥{TTiR]\f G • | X), A^ E N, satisfies the LDP on V{E) with rate 
N and with deterministic rate function I^"'^ : 'P{E) -^ [0, cxd] given by 

lT%q) ■■= inf {/^"^(Q) : Q G V'^^^iE^'), ttiQ = q}. (1.23) 

This rate function is lower semi- continuous, has compact levels sets, has a unique zero at q = qp^y, 
and is convex. 

Corollary 11.51 shows that the rate function in Birkner [1], Theorem 6, must be replaced by ()1.23p . 
It does not appear possible to evaluate the infimum in ()1.23p explicitly in general. For a g G T-'iE) 
with finite mean length and ^„®n = z/® , we have Ii^^{q) = h{q \ qp^y). 

By taking projective limits, it is possible to extend Theorems I1.2H1.3I to more general letter 
spaces. See, e.g., Deuschel and Stroock |6], Section 4.4, or Dembo and Zeitouni [5j, Section 6.5, for 
background on (specific) relative entropy in general spaces. The following corollary will be proved 
in Section [8l 

Corollary 1.6. The quenched LDP also holds when E is a Polish space, with the same rate function 
as in (frT3HrT6]) . 

In the companion paper [3J the annealed and quenched LDP are applied to the collision local 
time of transient random walks, and the existence of an intermediate phase for a class of interacting 
stochastic systems is established. 

1.3 Heuristic explanation of main theorems 

To explain the background of Theorem 11.21 we begin by recalling a few properties of entropy. Let 
H{Q) denote the specific entropy of Q G V^^^i^E^) defined by 

H{Q) := hm ^.h{Q~) G [0,oo], (1.24) 

where /i(-) denotes entropy. The sequence under the limit in (|1.24|) is non-increasing in N. Since 
q®^ is a product measure, we have the identity (recall (jl.2H1.4p ) 

H{Q I C) = -H(Q) - EQ[ioggp,.(n)] 

= -i/(Q) - EQ[logp(Ti)] - mQE^^[logz.(Xi)]. 

Similarly, 

Hi^Q I u^"") = -H{^q) -K^^iloguiXi)]. (1.26) 

Below, for a discrete random variable Z with a law Q on a state space Z we will write Q{Z) 
for the random variable f{Z) with f{z) = Q{Z = z), z (z Z. Abbreviate 

i^W := K(y(i), . . . , y(^)) and K^°°^ := k{Y). (1.27) 

In analogy with (J1.14p . define 

pcrg,fin(^N) := |q G V^'^E^) : mg < oo}. (1.28) 



Lemma 1.7. [Birkner [2], Lemmas 3 and 4] 

Suppose that Q G TJCfg.fin^^N) ^^^ j^^g^ ^ ^^ ^^^^^ Q-a.s., 

lim llogQ(E:(^)) = -mQF(v&Q), 
lim llogQ(ri,...,r^|i^(^))=:-//,|^-(Q), (L29) 

lim liogQ(y«,...,y(^)) = -/7(g), 

A'— >oo iV 

mQF(vI'Q)+//,|^(Q) = /7(Q). (L30) 

Equation (jl.SOp . which follows from p.29p and the identity 

Q(i^W)Q(n, . . . , r^ I kW) = Q(y«, . . . , y W), (L31) 

identifies H^\x{Q)- Think of Ht\k{Q) as the conditional specific entropy of word lengths under the 
law Q given the concatenation. Combining (jl.25ffL26]) and (jl.SOp . we have 

H{Q I C) = ^QHi^Q I ^^'') - H,\KiQ) - EQ[\ogp{n)]. (L32) 

The term —H^\j^[Q) — EQ[logp(Ti)] in (jl.32p can be interpreted as the conditional specific relative 
entropy of word lengths under the law Q w.r.t. p^^ given the concatenation. 

Note that tuq < oo and H[Q) < oo imply that H{^q) < oo, as can be seen from (ll.30p . Also 
note that —'E^^[logi^{Xi)] < oo because E is finite, and —Kgllog p{ti)] < cx) because of (jl.ip and 
ruQ < oo, implying that (|1.25f[L26|) are proper. 

We are now ready to give a heuristic explanation of Theorem 11.21 Let 

K,-,jn(^)^ < ii < • • • < j;v < oo, (L33) 

denote the empirical process of A^-tuples of words when X is cut at the points ji, . . . ,Jn (i-e., 
when Ti = ji for i = I, . . . ,N; see (I3THH3T7D for a precise definition). Fix Q G -pergMi^E^y 
The probability P(-RAr ~ Q \ X) is a sum over all A^-tuples ji, . . . ,Jn such that i?^ ■ {X) w Q, 
weighted by ni=i PiJi ~ji-i) (with jo = 0). The fact that i?^ • {X) ^ Q has three consequences: 

(1) The ji, ■ ■ ■ ,JN must cut ^ N substrings out of X of total length « Nrng that look like the 
concatenation of words that are Q-typical, i.e., that look as if generated by ^q (possibly 
with gaps in between). This means that most of the cut-points must hit atypical pieces of 
X. We expect to have to shift X by ~ exp[NmQH{'^Q | z^® )] in order to find the first 
contiguous substring of length Nmq whose empirical shifts lie in a small neighbourhood of 
^Q ■ By (II. ip , the probability for the single increment ji — jo to have the size of this shift is 
^exp[-NamQH{^Q \ u'^^)]. 



(2) The combinatorial factor exp[NH^ix{Q)] counts how many "local perturbations" of ji, . . . ,j]\f 

l^ ■ ( 



preserve the property that i?^ -jn^'^'^ ~ ^' 



(3) The statistics of the increments ji —jo, . . . , Jn—Jn-i must be close to the distribution of word 
lengths under Q. Hence, the 
(at least, for Q-typical pieces) 



lengths under Q. Hence, the weight factor ni=i PiJi ~ Ji-i) must be f« exp[iVEQ[logp(Ti)]] 



The contributions from (l)-(3), together with the identity in (jl.32p . explain the formula in (J1.16p 
on 77e=-g.fin(_gN)_ Considerable work is needed to extend (l)-(3) from r''''sfi'^{E^) to V'^'^iE^). This 
is explained in Section 13. 5[ 

In ( 1 ) , instead of having a single large increment preceding a single contiguous substring of length 
NniQ, it is possible to have several large increments preceding several contiguous substrings, which 
together have length Nrnq. The latter gives rise to the same contribution, and so there is some 
entropy associated with the choice of the large increments. Lemma |2. II in Section [2.11 is needed to 
control this entropy, and shows that it is negligible. 

1.4 Outline 

Section [2] collects some preparatory facts that are needed for the proofs of the main theorems, 
including a lemma that controls the entropy associated with the locations of the large increments 
in the renewal process. In Section [3] and H] we prove the large deviation upper, respectively, lower 
bound. The proof of the former is long (taking up about half of the paper) and requires a somewhat 
lengthy construction with combinatorial, functional analytic and ergodic theoretic ingredients. In 
particular, extending the lower bound from ergodic to non-ergodic probability measures is tech- 
nically involved. The proofs of Theorems I1.2H1.4I are in Sections [SHZl that of Corollary 11.61 is in 
Section [HI Appendix [A| contains a proof that the annealed and the quenched rate function are 
continuous under the truncation of the word length approximation. 

2 Preparatory facts 

Section 12.11 proves a core lemma that is needed to control the entropy of large increments in the 
renewal process. Section [2.21 shows that the tail property of p is preserved under convolutions. 

2.1 A core lemma 

As announced at the end of Section 11.31 we need to account for the entropy that is associated 
with the locations of the large increments in the renewal process. This requires the following 
combinatorial lemma. 

Lemma 2.1. Let lo = {loi)i<z^ he i.i.d. with P(c<Ji = 1) = 1 — P(a;i = 0) = p G (0,1), and let 
a € (l,oo). For N G N, let 

N 

0<Jl<-<j]\l<oo i=l 



and put 



limsup — log5Ar(u;) =: — </>(a,p) u> — a.s. (2-2) 

(the limit being uj-a.s. constant by tail triviality). Then 

hm— — — -- = 1. (2.3) 

Pio alog(l/p) 

Proof. Let tn := min{/ e N: w/ = uji^i = ■■■ = w^+jv-i}- In (|2.ip . choosing ji = tn and 
ji = ji-i + 1 for i = 2, . . . , A, we see that Sn^lo) > t^"- Since 

lim — logTAT ^ log(l/p) uj-a.s., (2.4) 

Af— >oo A 



we have 



(/)(«, p) < a log(l/p) VpG(0,l). 



(2.5) 



To show that this bound is sharp in the hmit as p J, 0, we estimate fractional moments of Sn{uj). 
For any (3 G (1/a, 1], using that (n + v)^ < u^ + v^ , u,v > 0, we get 



E 



Sn{uj) 



< E ^ 

0<il<-<jrjv<OO 



N 



"m- 



N 

=1} n^-^^ ~ ^ 



i-i} 



-a 13 



0<ji<--<JN<oo i=l 

[pCia(3)f, 



i=l 



(2.6) 



where C(s) = "^neN^ ''i ■s > !> is Riemann's ^-function. Hence, for any e > 0, Markov's inequahty 
yields 

■ log Sn{uj) > "5 [ logp + log C(a/3) + e] ) 

P ' (2.7) 



iV 



= p(5^(a;)/5 > e^^[pC(«/5)]'^) < e-^^[pC(«/5)]~'^E[5^(o;)'^ 
Thus, by the first Borel-Cantelli Lemma, 



<e 



^eN 



-(/)(a,p) =limsup — logS'Af(a;) < - [log p + log ((«/?)] a.s. 

Now let p i 0, followed by /3 J, 1/q to obtain the claim. 

Remark 2.2. Note that E[5Ar(a;)] = (K(a))^, while typically Sfqiy^) 



(2. 



D 



„aAr 



In the above 

computation, this is verified by bounding suitable non- integer moments of Sn{'^)/p ■ Estimating 
non-integer moments in situations when the mean is inconclusive is a useful technique in a variety 
of different probabilistic contexts. See, e.g., Holley and Liggett [9] and Toninelli [12]. The proof of 
Lemma |2. II above is similar to that of Toninelli |12j. Theorem 2.1. 

2.2 Convolution preserves polynomial tail 

The following lemma will be needed in Sections 13.31 and [3.51 For m G N, let /9*™ denote the m-fold 
convolution of p. 

Lemma 2.3. Suppose that p satisfies p{n) < Cpn~'^, n G N, for some Cp < oo. Then 

p*™(n) < (CpVl)m"+^n-° Vm,n gN. (2.9) 

Proof. If n < m, then the right-hand side of ()2.9p is > 1. So, let us assume that n > m. Then 

mm in 



xi,...,xin>l i = l 
xi-\ \-xm—n 



j=i 



Xl,...,Xm>l 

xiA \-Xm=n 

Xj=Xl\J---\JXm 

m—1 



i¥=j 



(2.10) 



<mCp\n/m]-" J^ H p{ 

= mCp \n/m] -" < Cp m"+^ n"". 



D 



3 Upper bound 

The following upper bound will be used in Section [5] to derive the upper bound in the definition of 
the LDP. 

Proposition 3.1. For any Q G "pmv.fin^^N-j ^^^^ ^^^^ e > 0, there is an open neighbourhood 
OiQ) C V'^'^iE^) ofQ such that 

Muisup^logF^RN eO{Q)\X) <-I^''{Q)+e X - a.s. (3.1) 

We remark that since \E\ < oo we automatically have I^'^{Q) € [0, oo) for all Q G "piiiv.fin^^N-j^ g^ 
the right-hand side of (|3.ip is finite. 

Proof. It suffices to consider the case ^q / z^*^^. The case *q = u®^ , for which I^'^{Q) = H{Q \ 
Ipu) ^s is seen from ()1.16p . is contained in the upper bound in Birkner [2], Lemma 8. Alternatively, 
by lower semicontinuity of Q' i— > H{Q' \ qf^), there is a neighbourhood 0{Q) such that 

in^if (g' I qf^) > H{Q \ q^^) - e = I^-{Q) - e, (3.2) 

Q'eO(Q) 



where 0{Q) denotes the closure of 0{Q) (in the weak topology), and we can use the annealed 
bound. 

In Sections I3.1H3.5I we first prove Proposition 13.11 under the assumption that there exist a G 
(1, oo), Cp < oo such that 

p{n)<Cpn~", nGN, (3.3) 

which is needed in Lemma 12.31 In Section 13.61 we show that this can be replaced by (II. ip . In 
Sections I3.1H3.41 we first consider Q G 'P'=^''g'^°(£'^) (recall (|1.28p ). Here, we turn the heuristics 
from Section 11.31 into a rigorous proof. In Section 13.51 we remove the ergodicity restriction. The 
proof is long and technical (taking up more than half of the paper). 

3.1 Step 1: Consequences of ergodicity 

We will use the ergodic theorem to construct specific neighborhoods of Q G 'P^^sfi'^(^E^"j that are 
well adapted to formalize the strategy of proof outlined in our heuristic explanation of the main 
theorem in Section [1.3[ 

Fix ei,6i > 0. By the ergodicity of Q and Lemma [LTj the event (recall (II. 9p and (I1.27P ) 

|^|i^W|GmQ + [-6i,ei]| 

n |-^logQ(K(*^)) G mQHi^i>Q) + [-£i,ei]| 

n|-^iogQ(y«,...,yW)G/f(Q) + [-£!, ei]| 

\K(M)\ 

i- Y. logz.((i^W)fc)GmQE^JlogK^i)]+[-ei,ei] 



(3.4) 



fc=i 



^M 



1 ^^ 1 

-^log/5(Ti) GEQ[logp(ri)] +[-£!, ei] \ 

i=i ) 



has Q-probability at least 1 — (5i/4 for M large enough (depending on Q), where li^^-^^^l is the 
length of the string of letters K^ ' . Hence, there is a finite number A of sentences of length M, 
denoted by 

{Za)a=l,...,A With Za := (y^"'^), • • . ,y'^^'''^) G E^' , 



(3.5) 



such that for a = 1, . . . , ^, 

\K{za)\(£ M{mQ-ei),M{mQ + ei) 

Q{K^^'^ = K{za)) G [eM-M{mQH{^Q) + ei)],eM-M{mQH{^Q) - ei)] 

Q((y«,...,y(*^)) = z„) G [exp[-MiHiQ) + e^)],exp[-MiHiQ)-e^)] 

ft(2a)| 

Y^ log u{{K{za))k) ^ [M{mQE^Q[logu{Xi)]-ei),M{mQE^Q[logu{Xi)]+ei) 

M 

5^1ogp(|y(»'^)|) G [M(EQ[logp(ri)] -ei),M(EQ[logp(ri)] +ei 



(3.6) 



fe=i 



i=l 



and 



5:Q((y«,...,y(^^)) = ..)>i-| 



a=l 



Note that ()3.7p and the third line of (j3.6p imply that 



AG 



Abbreviate 



[1 - -j) exp [M(if (g) - £i)] , exp [M(F(Q) + e,)] 



■■= {za, a = l,...,A}. 



Let 



^ := {C^'^\b = 1, . . . ,B} = {K{za), a = 1, . . . ,A} 
be the set of strings of letters arising from concatenations of the individual z^s, and let 

h:={i<a<A: K{za) = C^'^}, b = l,...,B, 



(3.7) 

(3.8) 

(3.9) 
(3.10) 

(3.11) 



so that \Ib\ is the number of sentences in £/ giving a particular string in ,^. By the second line of 
()3.6p . we can bound B as 

B < exp [MimgHi^Q) + ei)] , (3.12) 

because ^i,=iQ{K^^^> = Q^') < 1 and each summand is at least exp[—M{mQH{^Q) + ei)]. 
Furthermore, we have 



141 < exp [MiH^^KiQ) + 2ei)] , b = l,...,B, 



(3.13) 



since 



exp [ - MimgHi^Q) - si)] > Q{k{yW,. . . , F W) = C^'^) 

>5^Q((y«,...,yW) = z,)>|4|exp[-M(F(Q)+ei)], 

(3.14) 
and H{Q) - rnQHi^g) = H,\k{Q) by ^M- 
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3.2 Step 2: Good sentences in open neighbourhoods 

Define the following open neighbourhood of Q (recall (j3.9p ) 



O 



:= [q' E P^°^(i?^): Q\^J^) > 1 - 5i}. (3.15) 



Here, Q{z) is shorthand for Q{{Y^^\ . . . ,Y^ >) = z). For x € E and for a vector of cut-points 
(ji, . . . , jat) G N^ with < ji < • • • < JAT < oo and N > M, let 

Ctv := {C^%=i,...,N = (a;|(o,iih^l(jij2]---'^l(j^-i,j^]) e^^ (3-16) 

(with (0,ji] shorthand notation for (0,ji] fl N, etc.) be the sequence of words obtained by cutting 
X at the positions ji, and let 



i?^ ■ (x) ■- , ^ ■ 






be the corresponding empirical process. By (|3.15p . 

#{l < z < AT - M: (x|(,,.^,,,], . . . ,x|(,^^„_,,,^^,,]) G ^1 > iV(l - 5i) - M. 

Note that (j3.18p implies that the sentence ^n contains at least 

C ■= [(1 - 5i)N/M\ - 1 (3.19) 

disjoint subsentences from the set s^ , i.e., there are 1 < ii, . . . ,ic < N — M with ic — ic-i > M 
for c = 1, . . . , C such that 

(we implicitly assume that A^ is large enough so that C > 1). Indeed, we can e.g. construct the icS 
iteratively as 

io = -M, 

ic = min <k> ic-i + M: a sentence from s^ starts at position A; in ^tv k (3.21) 

c=l,...,C, 

and we can continue the iteration as long as cM + 5iN < N. But (j3.20p in turn implies that the 
ji^'s cut out of X at least C disjoint subwords from ^, i.e., 

a^l(,,,,,.,+M]e=^, c = l,...,C. (3.22) 

3.3 Step 3: Estimate of the large deviation probabiHty 

Using Steps 1 and 2, we estimate (recall p.lSp ) 

AT 

F{RMeO\X)= Y. ^o{Rl...,jjX))l[p{j,-j,^,) (3.23) 

0<ji<-<jrjv<OO i=l 

from above as follows. Fix a vector of cut-points (ji, ■ ■ ■ ,Jn) giving rise to a non-zero contribution 
in the right-hand side of (j3.23p . We think of this vector as describing a particular way of cutting X 
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filling subsentences 

I 




mediuin ~ Wq / 
TOod subsentences 



Figure 2: Looking for good subsentences and filling subsentences (see below ()3.25p ). 



into a sentence of N words. By ()3.22p . at least C (recall [3TT9]) of the jc's must be cut-points where 
a word from SS is written on X, and these C subwords must be disjoint. As words in SS arise from 
concatenations of sentences from s^ , this means we can find 



< 



such that 



XI 



(4/c+|k(Cc)|] 



<^c, 



<Q 



KC 



} C {0,ji,...,JAr} and Ci, •••,Cce 



^(c) e 



and 



> 



■c-\ 



+ |k(C 



c-1 



,c-i. 



(3.24) 



(3.25) 



We call Cii ••• J Cc the good subsentences. 

Note that once we fix the ^c's and the Cc's, this determines C + 1 filling subsentences (some of 
which may be empty) consisting of the words between the good subsentences. See Figure [2] for an 
illustration. In particular, this determines numbers ttt-i, . . . , mc+i € N such that rn-i + • • ■+mc+i = 
N — CM, where rric is the number of words we cut between the (c — l)-st and the c-th good 
subsentence (and mc+i is the number of words after the C-th good subsentence) . 

Next, let us fix good ii < ■ ■ ■ < £c and rj^^', . . . , r/^*^-* € ^, satisfying 



X\ 



„^.+|,(c)|]=r/(^\ 4>4-i + |r/(^-^)| 



1,...,C. 



(3.26) 



To estimate how many different choices of (ji, . . . ,Jn) may lead to this particular {{ic), {v^'^')), we 
proceed as follows. There are at most 



c 



{2MeiY exp [M{Hr\K{Q) + 2ei)Y ^ ^xp [N[Hr\K{Q) + ^2)] 



(3.27) 



possible choices for the word lengths inside these good subsentences. Indeed, by the first line of 
(|3.6p . at most 2M£\ different elements of ^ can start at any given position i^ and, by (j3.13p . each 
of them can be cut in at most exp \JVI{Ht\x{Q) + 2ei)] different ways to obtain an element of s^ . 
In (j3.27p . 82 = 52[£i,5i,M) can be made arbitrarily small by choosing M large and ei,5i small. 
Furthermore, there are at most 



N -C{M-l) 
C 



< exp [(53 A^] 



(3.28) 



possible choices of the rric's, where ^3 = 52,{5i,M) can be made arbitrarily small by choosing M 
large and 5i small. 
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iN 



Next, we estimate the value of Yl^^i p{ji — ji-i) for any {ji, ■ ■ ■ ,Jn) leading to the given 
{{ic), (^ ))• In view of the fifth line of ()3.6p . we have 

N 

I I 0(j._j. A-"- {the i-th word falls inside the C good subsentences} 

<exp[CM(EQ[logp(ri)]+ei)] 
<exp[iV(EQ[log/5(Ti)]+54)], 



i=l 



(3.29) 



where 5^ = 54,{ei,6i,M) can be made arbitrarily small by choosing M large and £i,5i small. The 
filling subsentences have to exactly fill up the gaps between the good subsentences and so, for a given 
choice of (^c)> (^ ) and {rric), the contribution to ni=i PiJi ~ Ji-i) from the filling subsentences is 
n^i P*™''(4 - 4-1 - l^'-'^^^-'l) (the term for c = 1 is to be interpreted as p*'^^{ii), and p*^ as ^o)- 
By Lemma [231 using p.Sp . 

c 



c=l 



^(^"1)1 



< (Cp V 1)^ ( n <""') n ((4 - 4-1 - w'''^\) V 1 

\c=l / c=l 

/ JV - CM\ (a+l)C il ^ 

<(gpVif( ^^ ) J]((4-4_,-|^ 

c=l 
C 

< exp[iV55] n ((4 - 4-1 - |r/(^-^) I) V 1) 



(3.30) 



(c-l)| 



VI 



c=l 



where ^5 = (5((5i,M) can be made arbitrarily small by choosing M large and 61 small. For the 
second inequality, we have used the fact that the product nc=i '^c "'^^ is maximal when all factors 
are equal. 

Combining (J3.23fl330]) . we obtain 



{Rn&0\X)< exp N[H,\K{Q)+^Q[\ogp{Ti)\ + 52 + 5^ + 5^ + 5,, 



E f\[{lc-lc-i-\ri^'''^\)yiy 

(4), (r?('=') good ^=1 



(3.31) 



Combining p.3ip with Lemma 13.21 below, and recalling the identity in (|1.32p . we obtain the result 
in Proposition 13.11 for p satisfying (j3.3p . with O defined in (j3.15p and e = 52 + 5^ + 54^ + 5^ + 5%. 
Note that e can be made arbitrarily small by choosing ei, 5i small and M large. 

3.4 Step 4: Cost of finding good sentences 
Lemma 3.2. For ei,5i > and M £ N, 



1 



lim sup — log 



C 



(c-l)| 



VI 



E Uiiec-ic^i-w 

_(4), (r?(^)) good ^==1 

<-amQH{^Q\i^'^^)+5fi a.s., 
where 5q = 5{ei,5i,M) can be made arbitrarily small by choosing M large and ei, 5i small. 



(3.32) 
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Proof. Note that, by the fourth hne of (f3^ . for any t] e ^ (recall (|3.10p ) and A; G N, 

P(r/ starts at position kin X) < exp [M(mQE,j,Q [log iy{Xi)] + ei)] . (3.33) 

Combining this with ()3.12p . we get 

P(soine element of ^ starts at position k in X) 

< exp [M{mQE^Q [log u{Xi)] + ei)] x exp [M{mQH{^Q) + ei)] (3.34) 

= exp [ - M{mQHi^Q \ z.®^) - 2ei)] , 

where we use (|l.'26p . 

Next, we coarse-grain the sequence X into blocks of length 

L:=lM{mQ-ei)\, (3.35) 

and compare the coarse-grained sequence with a low-density Bernoulli sequence. To this end, define 
a {0, l}-valued sequence (A)ieN inductively as follows. Put ^o ■= 0, and, for Z G N given that 
Aq, Ai, . . . , Ai_i have been assigned values, define Ai by distinguishing the following two cases: 

(1) If Ai_i = 0, then 

1, if in X there is a word rj ^ ^ starting in {{I — 1)L, IL], 
Ai := { (3.36) 

0, otherwise. 



(2) If Ai^i = 1, then 
1, 



Ar.= 

Put 
Then we claim 



if in X there are words rj,rj' G ^ starting in ((/ — 2)L, (/ — 1)-^], 
respectively, ((/ — 1)L,IL] and occurring disjointly, 



(3.37) 



0, otherwise. 



p := L exp [ - M{mQH{^Q \ u^^) - 2ei)] . (3.38) 



F{Ai = ai,...,An = an)<p'''+-+''\ n G N, oi, . . . ,a„ G {0, 1}. (3.39) 

In order to verify ()3.39p . fix ai, . . . , a„ G {0, 1} with ai + ■ ■ ■ + On = m. By construction, for the 
event in the left-hand side of ()3.39p to occur there must be m non-overlapping elements of ^ at 
certain positions in X. By (|3.34p . the occurrence of any m fixed starting positions has probability 
at most 

exp [ - mM{mQH{^Q \ u'^^) - 2ei)] , (3.40) 

while the choice of the a^'s dictates that there are at most L™ possibilities for the starting points 
of the m words. 

By (13.39p . we can couple the sequence (^/)igN with an i.i.d. Bernoulli (p) -sequence {uji)i^^ such 
that 

Ai<uJi VZgN a.s. (3.41) 

(Note that (I3.39P guarantees the existence of such a coupling for any fixed n. In order to extend 
this existence to the infinite sequence, observe that the set of functions depending on finitely many 
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coordinates is dense in the set of continuous increasing functions on {0, 1}^, and use the results in 
Strassen [11].) 

Each admissible choice of ^i,...,^^ in (|3.32p leads to a C-tuple ii < ■■■ < ic such that 
Ai^ = ■ ■ ■ = Ai^ = 1 (since it cuts out non-overlapping words, which is compatible with (J3.36I - 
I3.37P ). and for any such (ii, . . . ,ic) there are at most L'~" different admissible choices of the icS. 
Thus, we have 

c c 

Y, \[{{lc-^c-i-\'n^'-^^\)yl)"' <L''L-^ Y. \{{^c-^c-lr''■ (3.42) 

(4), (r;(=)) good -=i ''^'^:::<;^<^ ^=^ 

Using (|3.19p and recalling the definition of 0(a,p) in (|2.2p . we have 

lim sup — log [ r.h.s. ^M) ] < ^7^ ( log {MmQ) - c^{a, p)) {uj,A)-a.s. (3.43) 

From ([338]) we know that log(l/p) ~ M{mQH{^Q \ i^®^)-2ei) as M ^ 00 and so, by Lemma[2Jl 
we have 

r.h.s. ^M < -(1 - e2)a{mQH{^Q \ z^®^) - 2ei) (3.44) 

for any 82 G (0, 1), provided M is large enough. This completes the proof of Lemma [3. 2 [ and hence 
of Proposition [31] for Q E p^'-g'fi°(^P^). D 

3.5 Step 5: Removing the assumption of ergodicity 

Sections I3.1H3.4I contain the main ideas behind the proof of Proposition 13. 1[ In the present section 
we extend the bound from 'P'^^sfi^(^E ) to 'P^^'''fi^(^E ). This requires setting up a variant of the 
argument in Sections I3.1H3.4I in which the ergodic components of Q are "approximated with a 
common length scale on the letter level" . This turns out to be technically involved and to fall apart 
into 6 substeps. 

Let Q € 'P^'^^'^'^l^E^) have a non-trivial ergodic decomposition 



Q= ^ Q'WQidQ'), (3.45) 

Jpcrg(£;N) 

where Wq is a probability measure on ^^^^^(^e ) (Georgii [7], Proposition 7.22). We may assume 
w.l.o.g. that H{Q I qf^) < 00, otherwise we can simply emj 

in fact supported on perg,fin(^N) p |g/. ^(g/ I ^®N) ^ oq|^ 



w.l.o.g. that H{Q I qf^) < 00, otherwise we can simply employ the annealed bound. Thus, Wq is 






Fix e > 0. In the following steps, we will construct an open neighbourhood 0{Q) C ■p"^^(i?'^) of 
Q satisfying (|3.ip (for technical reasons with e replaced by some e' = e'(e) that becomes arbitrarily 
small as e J, 0). 

3.5.1 Preliminaries 

Observing that 

mQ= f mQ^WQidQ')<^, HiQ\qf^)= f __ }i{Q'\qfi)WQ{dQ') < ^, (3.46) 

we can find Kq^ K\^ m* > and a compact set 

^ ^ pinv(^N) ^ snppiWg) n {Q: Hi-\qf^J) < Ko} (3.47) 



15 



such that 

sui>{H{^P I u®^) : Pe"^} <Ki, (3.48) 

sup{mp : P G -T} < m*, (3.49) 

the family {^p(ti) : P G "^j is uniformly integrable, (3.50) 

Wq{'^)> l-e/2, (3.51) 

/ H{Q'\qf^) WgidQ') > H{Q\qf^) - e/2, (3.52) 

/ mQ,H{^Q,\u^'')WQ{dQ') > mQii-(*Q |z.®^) -e/2. (3.53) 

In order to check (j3.50p . observe that Eq[ti] < cxo implies that there is a sequence (c„) with 
lim^^oo Cn = oo such that 

IEQ[nlw>M]<^|, neN. (3.54) 

Put 

An := {Q' G V"'^E^): Eq, [nl|,^>,„}] > 1/n} (3.55) 

and A := n„gN(^n)'^- Each An is open, hence yl is closed, and by the Markov inequality we have 
Wq[{Q': EQ4ril{,,>,„|] > 1/n}) < nEQ[ril{,,>,„|] < ^ |. (3.56) 

Thus, 

Wq{A^) = WQiUnef^An) < I ^ -A_ = £. (3.57) 

neN 

This implies that the mapping 

Q' I— > mQ'Hi^^Q/lv^ ) is lower semicontinuous on 'tf. (3.58) 

Indeed, if w - lim„^oo <5n = Q" and (Q^) C '^, then lim^^oo EqjJti] = lim^^oo "^QJ, = "t-q" = 
EQ//[ri] and w — lim„_>oo ^q' = ^Q" by uniform integrability (see Birkner [2], Remark 7). 

Furthermore, we can find Nq,Lq G N with Lq < Nq and a finite set W C E ° such that the 
following holds. Let 



W 



■■= {^Lo(^MC)): C = (C(^\...,C(^°)) G 1^,0 < i < 1C«|} (3.59) 



be the set of words of length Lq obtained by concatenating sentences from W, possibly shifting the 
"origin" inside the first word and restricting to the first Lq letters. Then, denoting by ^ the set of 

all p G 7?inv,fin(^N) p c^ ^j^^^^ satisfy 

^P(C)>1-:^^, VeGt^:vI/p(e)<i±^Ep[l^(7r;VonEl{5}(vrL„^V(y))j (3.60) 



=0 



H{P I g,«^) + 5/4 > -1 J] P(C) log ^^ > /?(P I qfi) - el A, (3.61) 

mpF(M/p I z.^^) + e/4 > ^ ^ ^p(i.) log ^J^pv > nipH{^p \ u^"") - e/4, (3.62) 
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we can choose Nq, Lq and W so large that the following inequalities hold: 

WQi&) > l-3e/4, (3.63) 

[ H{P\qf^)WQidP) > HiQ\qf^)-3s/4, (3.64) 

I mpH{^!p\v'^''')WQ{dP) > mQ/7(*Q I i.®^) - 3e/4. (3.65) 



We may choose the set W in such a way that 



min{zy®^o(^): ^eW] 1 
max{|C(i)|: C ^W} \W\ 



5^ := min{g^^^ (C) : C G H^} • ^^^^^^^\. ^,!-:'.' ' ^ ^^J ' ^ > 0- (3.66) 



3.5.2 Approximating with a given length scale on the letter level 

For P € T^inv.fin^^N^)^ ^g p^^ 

^PW '■= ^W ■ ("^"^ {^(0 : C e H^, P(C) > 0} A min {M/p(0 : i^W, ^ p{i) > O}) . (3.67) 

For (5 > and L E N, we say that P G pi'^^'fin^^N^ ^^^ j^g ^-^^ L)- approximated if there exists a finite 
subset -s/p C E'Ti/'^^l of "P-typical" sentences, each consisting of ~ L/mp words (we assume that 
L > Nqitip), such that 

and, for all z = {y^^\ . . . ,y(rV'»pl)) ^ j^p^ 

P{z) e [exp [ - \L/mp'\ {H{Q) + ,5)] , exp [ - \L/mp'\ {H{Q) - 5)] 
\k{z)\(^[L{1-5),L{1 + 5)1 
p(^(rV-pl) = ^) e [exp[-L(i7(M/Q) + <5)],exp[-L(i/(*Q)-<5)]], 

^ log i.(/c(z)fe) E [L(l - 5), L(l + 5)] E^^ [ log ^.(Xi)] , (3-69) 

fe=i 

[L/mp] 

E logp(|y«|) e [(L/mp)(l-5),(L/mp)(l + 5)] Ep[log/.(Ti)] , 

\{z' E ^p: k(z) = k{z')}\ < exp [(L/mp)(/7,|;^(P) + 5)] . 
By the third and the fourth line of (j3.69p we have, using (jl.26p . 

F{X starts with some element of k{£/p)) < exp - L(l - 26)H{'^q \ u'^^) . (3.70) 

For P that can be (5, L)-approximated, define an open neighbourhood of P via 

%L)(P) := {P' G P-^(^^) : ^ G (1 - 5 . J^~, 1 + 5- 5^^) Vz G .^/pj , (3.71) 

where s^p = £/p{6,L) is the set from (|3.68fl3l69]) . By the results of Section [3.11 and the above, for 
given P e perg,fin(^N) n 'T and (5o > there exist 6' G (0, do) and L' such that 

VL" > L': P can be {6', L")-approximated. (3.72) 
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Assume that a given P G ^ can be (5, L)-approxiniated for some L such that \L/mp~\ > Nq. 
We claim that then for any P' e & D U^s,L)iP), 

pf^^lL/mp] \ ^p) < 25 ■ 6p^^, (3.73) 

^.g^. p,..^<f(l + 35)P(C) _ ifnC)>0, 

^ ' ~\26-6p^{<26- min{g®^o(C') : C' e W}) otherwise, ^ ' ' 



Veei^: mp,^p,(0 <<!.,, .,.._ ■ r «Lo/m. ^'-"- - • (^-^^^ 



(l+e/2)(l + 3(5)mp*p(0 if ^p(0 > 0, 

(1 + e/2)26 min{i/®^o (^') : ^' G VF} otherwise, 

mp, >(1 - 35)(mp - e) (> (1 - 35 - e)mp). (3.76) 

(IX75D follows from (l^:U5]l and (IHTTD . To verify (1X71) . note that, for C e ^, 

p'(c) < E ^'(^) + E ^'(^) 

<(l + 5) ^P(z) +P'(^rV™pl\^^) 

2e.c/p:7rjvQ(2)=C 

and use (j3.73p on the last term in the second line, observing that 6p7^ < P{C) whenever ( (^ W 
and P{C) > 0. To verify (|3.75p . observe that, for ^ G VF (recall the definition of ^p/ from (jl.lOp ). 
using ()3.60p . 

\CW\-i 
(l + e/2)-imp,^p,(e)<5^P'(C) Yl Moi^LoiO'^^iO)) 

(ew i=o (3.78) 

<(l+5)mpM/p(0+ Yl \C^'^\P'(0 
(eW: P(C)=o 

and that the sum in the second line above is bounded by \W\ ■ max{|C(^-'| : ( G W} ■ 25 ■ 5p^, 
which is not more than 2(5m,p^p(^) if '^ p{^) > and not more than 2(5min{i/®^°(^') : ^' G W} 
otherwise. Lastly, to verify (j3.76p . note that 

P'(C) > (1 - 3(5)P(C) VC G i^ (3.79) 

(which can be proved in the same way as (j3.74p ). so that 

mp, = Y\y\P'{y) > YlC^'^P'iC) > (1-35) Y \C^'^\PiC)- (3-80) 

Furthermore, 

mp<Y\c^'^\piO + ^i3/e]P{E'''\w)+ Y. \y\p^y^- (^.si) 

(&W y&E: \y\>c^-i/e-\ 

Observing that the second and the third term on the right-hand side are each at most e/3, we find 
that daSOHSSI]) imply ^7W!^ . 

Finally, observe that (J3.74fl3776]) imply that there exists 5q (= 5o(e)) > with the following 
property: For any P,P'(^S^ such that P can be {5, -L)-approximated for some L with \L/mp~\ > Nq 
and 5 < 5o and P' G U(^s,L){P)i we have 

H{P'\qf^) < (l + e)(H{P\qf^)+e) and (3.82) 






mp.H{^p, I z.^'^) < (1 + e) ( mpH{^!p \ z^^'^) + e ) . (3.83) 
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Here, (j3.82p follows from the observation 



H{P' I q^^] 



e 
4 



< 



Similarly, observing that 

/ /-, e\/. o^^ "ST- T /AM (l + e/2)(l + 35)mp*p(0 
< (l + 2 j (1 + 3'^)"^P 2^ ^^(0 log ^ ' 



v-^ ^ ,^,^ (l + e/2)25min{i/®^o(e'):e'eW^} 



(3.85) 






<! + -(! + 36)LompH{^p \ u^'^) + e/2 + m* log 



we obtain (j3.83p in view of (j3.62p . 

3.5.3 Approximating the ergodic decomposition 

In the previous subsection, we have approximated a given P € 'P'^''g'^'^(i?^), i.e., we have constructed 
a certain neighbourhood of P w.r.t. the weak topology, which requires only conditions on the 
frequencies of sentences whose concatenations are ~ L letters long. While the required L will in 
general vary with P, we now want to construct a compact '^' C ^ such that Wq{^') is still close 
to 1 and all P € '^' can be approximated on the same scale L (on the letter level). To this end, let 

^^,j^, := [P e Si: P can be (e',L')-approximated}. (3.86) 

By (I322D, we have 



y Se',u = P^'-g'fi'^(E^) n <r, (3.87) 

£'e(0,e/2) 

so, in view of ()3.51fl3T53|) . we can choose 



< ei < — - A -i- (3.88) 

2m*{ly Ki) 2 ^ ^ 

and L e N such that 

WQ{&e,,L) > l-e, (3.89) 

/ HiQ'\qf^)WQidQ') > H{Q\qf^)-e, (3.90) 



-^ £ 1 , L 






e. (3.91) 
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For P G &er,L, let 



U'{P):={P' eV'^iE''): ^'^^^ 



ei 



ei 



p(z) ^l^'y^^.w^'i + yW v^e^p 



(3.92) 



where £/p is the set from (I3.68f[3?69]) that appears in the definition of l^(ei,L){P) ^^^ ^pw ^^ 
defined in (IHTHTl) . Note that U'{P) C U^^^^l){P). Indeed, infp^^^^^ dist(Z^'(P),^/(£^,L)(^)'^) > if 
we metrize the weak topology. Consequently, 



is compact and satisfies Wqi^'io") > 1 — e, and 



(3.93) 



(3.94) 



is an open cover. By compactness there exist i? € N and (pairwise different) Qi, ■ ■ ■ ,Qr G 
P^''g'fi°(^^)n<^ such that 

^ieuL){Ql) U • • ■UU^,^^l){Qr) ^ '^', (3.95) 

where U(^ei,L){Qr) is of the type (f3TT]) with a set i2<. C ^^^'- satisfying ([HMEMl) with P replaced 
by Qr, and Mr = \L/mQ^]. 

For z G U„gN-£'" consider the probability measure on [0,1] given by ^q^z{B) := Wq{{Q' G 
-perg,fin(^N). g/(^) ^ ^|)^ q ^ [q, 1] measurable. Observing that 



R 



M M {u G [0, 1] : u is an atom of /^q,^} 

r=l zSjz/r 



(3.96) 



is at most countable, we can find £2 G [ei,ei + ef) (note that still 62 < 2ei) and 6 > such that 



W^Q < 



Q'iz)/Qr{z) G [1 - (£2 + '^)^5q^ p^, 1 - (£2 - '^)^5q^ vf] °1' 

g' G p-g.fi-(i5;^) : Q'(z)/Q,(z) g [1 + (£2 - <5)<^q^,^, 1 + (^2 + 5)<^q^,|^] 

for some r G {1, . . . , R} and z (^ £/r 
e 



\ 



J 



(3.97) 



< 



1 V Ko V m*Ki 
Define "disjointified" versions of the U(£,L){Qr) as follows. For r = 1, . . . ,R, put iteratively 



Ur 



Q'{z) eQr{z){l-e25^ 7r^,l + e25^ ^) for ah z G =< 



lr,W 



Qr,W' 



Qi ^ •pinv^^N-j . g^j^jj £qj. ga^g];^ r' < r there is z' G iz/r' such that 

Q'{z') ^ Qr'{z')[l - (£2 + ^)<5q^,,^,1 + {e2 + S)5^ 



^,,w> 



(3.98) 



It may happen that some of the Ur are empty or satisfy WQ{Ur) = 0. We then (silently) remove 
these and re-number the remaining ones. Note that each Ur is an open subset of 'P™^(£'^) and 



Wq ( Uf=i Ur) = J2 WqiKr) > 1 - 2e, 



(3.99) 



r=l 



since Wq{'^' \ U^^^Ur) < e. 
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For r = 1, . . . , R, we have, using (j3.82fl3T83]) and the choice of 62 (< 2ei < 5o), 



WQ{Urn^)iH{Qr\qf:;)+e] > -r— I H{Q' \ qf^)WQ{dQ'), (3.100) 



Urn& 



so that altogether, using (I3.90H3.9T]) . 



mQ,H{^Q,\u^'^)WQ{dQ'),{3.Wl) 



R 



1 



r=l 



(3.102) 



®N> 



3.5.4 More layers: long sentences with the right pattern frequencies 

For z € U„eN-E" and ^ = {^W ^ . . . ,C(^)) € E^^ (with M > |z|), let 

freq,(e) = ^|{1 < i < M - |z| + 1 : (e«, . . .,^(^+1^1-1)) = z}\ 



(3.103) 



be the empirical frequency of z in ^. Note that, for any P € V^^^^^^i^E ), z G UnenE"^ and e' > 0, 
we have 

(3.104) 

(3.105) 



hm P({(eE^': freq,(0 £ P{z)il - e' ,1 + e')}) = 1 



and 



lim P[\(eE^^: \K(n\ e M(mp - e' ,mp + e')]] = 1. 



For M G N and r E {1, . . . , i?}, put 



V 



r,M ■" 



|«;(0| G M{mQ^ -e2,mQ^ +£2), 



£ e ^A?. freq^(0 e Qr(^)(l - e2'^Q^_t^, 1 + ^2<^Q^_fy) for all z G 
and for each r' < r there is a z' G i2^' such that 



(3.106) 



freq,,(0 ^ QrK^Oil " (^2 + <5)'^q^,,vk, 1 + (^2 + S)Sq^,,w^ 

for r ^ r'. For f & V rr, 



Note that when \E\ < 00, also \V rjl < 00. Furthermore, V t^HV , tj - 

II 1 I r',M I ' r,M r,M 

we have 

l<i<M-M^ + l: (e(*\e^*+^\...,^('+^^'-^)) G<} >M(l-2e2 



(3.107) 



in particular, there are at least K^ := [^(1 — 3e2)/-^rJ elements zi, . . . , zk^ G ^r (not necessarily 
distinct) appearing in this order as disjoint subwords of ^. The z^'s can for example be constructed 
in a "greedy" way, parsing ^ from left to right as in Section [3.21 (see, in particular, (j3.2ip ). This 
implies, in particular, that 



Kr 



M Kr 

n-^d^^'^i) ^ n n -^^^d ^ («^p [(i -e2)M;EQjiogp(n): 

i=l fc=l to in 2:(; 

< exp [(1 - 4e2)MIEQjlog/)(Ti)]] < exp [MEQ^[log pin)] + Mec'^ 



(3.108) 
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if M is large enough, where c' := sup^gg^pp(p){— \og{p{k))/k} (< oo) and we use that £2"!* < e by 
definition. Furthermore, for each r € {!,..., R} and 77 € F tt, we have 



|{Ce^,jg: K(C) = K(r/)}|<exp M{H,\K{Qr) + h 



(3.109) 



where 61 can be made arbitrarily small by choosing e small. (Note that the quantity on the left- 
hand side is the number of ways in which K{rj) can be "re-cut" to obtain another element of y tj.) 
In order to check (I3.109p . we note that any C ^ ^^M ™^s^ contain at least Kr disjoint subsentences 
from ^/r, and each z S £/r C E '' satisfies \k{z)\ > L. Hence there are at most 



^("^Q,. + £2) - Kr{L - 1)\ ^ 24£2MmQ^ < 2^^^"^*^ 



Kr 



(3.110) 



choices for the positions in the letter sequence K{r]) where the concatenations of the disjoint sub- 
sentences from s/j. can begin, and there are at most 



M -Kr{Mr-l] 
Kr 



< 2^^2 



M 



(3.111) 



choices for the positions in the word sequence C where the subsentences from ^ can begin. 
By construction (recall the last line of (j3.69p ). each z G J2^ can be "re-cut" in not more than 
exp[(L/mQ^.)(H^ix{Qr) + £2)] many ways. Combining these observations with the fact that 



exp [{L/mQ^){H^\K{Qr) +£2, 



Kr 



< 



exp 



M 

Wr 



Mr{H,\K{Qr) + £2] 



(3.112) 



we get (|3.109p with <5i := £2 + 3^2 log 2 + 462^^* log 2.^ 

We see from (J3.104fl37l05p and the definitions of Ur and F ^ that, for any e' > 



MeN 



Ur. 



(3.113) 



Put £3 := £2 minj.=i^...^/j WQ(Z//r) (< £2). We can choose M so large that 

WQ{[PeUr: P(y~)>l-|})>M^Q(Z?,)(l-|), r = l, 

For M' >M and r = 1, ...,/?, put 

Wr,M' ■■= {C G E^'' ■■ freqy -,(0 > 1 " ^3/2}. 



R. 



(3.114) 



(3.115) 



Note that for r ^ r' (because VrM^^r'M~^^ there cannot be much overlap between ^ G Wr^M' 
and r] G Wr\M''- 

max{/c: fc-suffix of C = fc-prefix of t]} < e^M' (3.116) 

(here, the fc-prefix of r] G i?", k < n, consists of the first k words, the fc-suffix of the last k words). To 
see this, note that any subsequence of length A; of C must contain at least {k — e^M' /2)^ positions 
where a sentence from V r^ starts, and any subsequence of length k of r] must contain at least 
{k — £3M'/2)_|_ positions where a sentence from F , ^ starts, so any k appearing in (J3.116P must 
satisfy 2{k — e^M' /2)^ < k, which enforces k < e^M' . 

Observe that (|3.115p implies that we may choose M' so large that for r = 1, . . . , i?. 



M' 
each C G Wr^w contains at least (1 — £3)^^^^ disjoint subsentences from V^-^. 



(3.117) 
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For P G 7?crg,fin(^N) ^j^j^ -P(Km) > 1 - £3/3 we have 

lim PiWrM') = 1, (3.118) 

and hence 

U [PeUr-. P{Wr,M') > 1 - £2} 3 {P e Z^.: P{V^j^) > 1 - £3/3}, (3.119) 

M'>M 

and so we can choose M' so large that 

WQ(^{PGiir: P{Wr,M')>l-e2})>WQ{Ur){l-e2), r = l,...,R. (3.120) 

Now define 

0{Q):={q' GV"'^E''): Q'{Wr,M')>WQiUr){l-2s2),r = l,...,Ry (3.121) 

Note that 0{Q) is open in the weak topology on V^'^^{E ), since it is defined in terms of requirements 
on certain finite marginals of Q', and that for r = 1, . . . ,R, 

Q{Wr,M') = f Q'{Wr,M')WQ{dQ') > f Q' {Wr,M')WQ{dQ') > (l - £2) Vq(Z7,) (3.122) 

by (ISl^Oll . so that in fact Q G 0{Q). 

3.5.5 Estimating the large deviation probability: good loops and filling loops 

Consider a choice of "cut-points" ji < • ■ ■ < Jn as appearing in the sum in ()3.23p . Note that, by 
the definition of 0{Q) (recall (|3M3Tfl) ). 



Rl_^jX)GO{Q) (3.123) 



enforces 



|{1 < z < N-M': (X|(,^_,,,,], . . . ,X|(,. ^^^,_^,,^^^^,]) G Wr,M'}\ > NWgiUr) (1-382), r = l,...,R, 

(3.124) 
when N is large enough. This fact, together with (j3.116p . enables us to pick at least 



R _ 

J:=^\{l-Ae2)N/M']WQ{Ur) (3.125) 

r=l 

subsentences Ci, • • • , Cj occurring as disjoint subsentences in this order on ^tv such that 

~ A^ 
\{l<j< J- Cj^ Wr,M'}\> {I- ^e2)WQ{Ur)—, r = 1, . . . , i?, (3.126) 

where we note that J > (1 - 4e2)(l - 2e){N/M') (> (1 - 8e){N/M')) by (IHIMD . Indeed, we can for 
example construct these ^j's iteratively in a "greedy" way, parsing through S^n from left to right 
and always picking the next possible subsentence from one of the R types whose count does not 
yet exceed (1 — 4:£2)WQ{Ur) {N/M'), as follows. Let ks^r be total number of subsentences of type 
r we have chosen after the s-th step (/co,i = • • • = /co,_r = 0). If in the s-th step we have picked 
(s = i^N ' • • • > ?jv ) at position p, then let 

p' := min {i > p + M' : at position i in ^n starts a sentence from W^^m' for some u (^Ug], 

(3.127) 
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where Us := {r: k^^s < (1 — ^£2)WQ{Ur) {N/M')}^ pick the next subsentence Cs+i starting at 
position p' (say, of type u) and increase the corresponding ks+i,u- Repeat this until kg^r > (1 — 
Ae2)WQ{Ur) {N/M') for r = 1, . . . , i?. 

In order to verify that this algorithm does not get stuck, let rein(s,r) be the "remaining" 
number of positions (to the right of the position where the word was picked in the s-th step) where 
a subsentence from Wj.^m' begins on ^at. By (I3.124p . we have 

rem(0,r) > 7VWq(W^)(1 - 3e2)- (3.128) 

If in the s-th step a subsentence of type r is picked, then we have rem(s + l,r) > rem(s,r) — M', 
and for r' ^ r we have rem(s + 1, r') > rem(s, r') — e^M' by (I3.116p . Thus, 

rem(s, r) > rem(0, r) - ks,rM' - {s - ks^r)e?,M' 
= mui{{),r) - ks,r{l - e^)M' - se^M' , 

which is > as long as kg^r < (1 - ^£2)WQ{Ur) {N/M') and s < J. 

A. Combinatorial consequences. By (I3.117P and ()3.126p . i?^ {X) E 0{Q) implies that ^at 
contains at least 

C:=Y^ \{l - Ae2)WQ{Ur)^] \{1 - e,)^] ( > (1 - 5e2)(l - 2e)^ ) (3.130) 

disjoint subsentences r]i, . . . ,rjc (appearing in this order in ^^v) such that at least 

N 

■^{l-6e2)WQ{Ur)oi the rjc's are from V^j^, r = l,...,R. (3.131) 

Let ki, . . . ,kc {kc+i > kc + M, 1 < c < C) be the indices where the disjoint subsentences rjc start 
in ^N, i.e., _ 

ric = (4'^\ d'^-^'^ • • • > d^^^'^'O ^ ^r.,M^ i = c,...,C, (3.132) 

and the re's must respect the frequencies dictated by the WQ(Wr)'s as in (|3.13ip . Thus, each choice 
(ji) ■ ■ ■ tJn) yielding a non-zero summand in (j3.23p leads to a triple 

ih, . . .,ic), in, . . .,rc), (%, ...,%) (3.133) 

such that r/e G k(F j^), ic+i > ^c + \Vc\^ the r^s respect the frequencies as in (I3.13ip . and 

the word fj^ starts at position ^c in X for c = 1, . . . , C. (3.134) 

As in Section 13.31 we call such triples good, the loops inside the subsentences r]i good loops, the 
others filling loops. 

Fix a good triple for the moment. In order to count how many choices of ji < • • • < Jn can 
lead to this particular triple and to estimate their contribution, observe the following: 



1. There are at most __^ 

'N - c{M - 1; 
c 



) < exp(5iA) (3.135) 



choices for the ki < ■ ■ ■ < kc, where 6[ can be made arbitrarily small by choosing e small 
and M large. 
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2. Once the kcS are fixed, by (J3.109P and (j3.13ip there are at most 

R 



r=l 



fWQ(Ur) 



R ^ 

exp \n Y^ WQ{Ur){H,\K{Qr) + S 



(3.136) 



r=l 



choices for the good loops and, by (|3.1U8|) . for each choice of the good loops the product of 
the p{jk — jfc-i)'s inside the good loops is at most 



R 



n ( exp [MEgJlog p(ti)]] + Mc^e) -^'^''^"'■^ 



r=l 



R 



(3.137) 



< exp 



Nc'^e + Nj2WQiUr)EQ^[logp{Ti)] 



r=l 



3. For each choice of the kcS, the contribution of the filling loops to the weight is 



c=l 



C-1 



c 



< {C, V ifk'^^' H (fce+i -k,- MT^^ W ((4 - 4-1 - \ric-i\) V 1) 



< (CpVi) 
c 



c 



c=l 



N -CM\{c,+i)C 



c=l 



C 



C 



n((^c-4-i-|r?,_i|)Vl) 



c=l 



< e^^^n((^^-^-i-i^-ii)vi)^ 



(3.138) 



c=l 



where 82 can be made arbitrarily small by choosing e small and M large (and we interpret 
4 = 0, |r/o| = 0). Here, we have used Lemma 12.31 in the first inequality, as well as the fact 
that the product Y\c=i (^c+i — k^ — M) is maximal when all factors are equal in the second 
inequality. 

Combining (|3.135fl3Tl38p . we see that 

¥{RNeO{Q)\X) 

R _ 

< e(^i+'^^+^^+-p)^exp [NY,WQ{Ur){H^\K{Qr) +^QA^Ogp{T{)]) 



r=l 
C 



(3.139) 



X E \{{ih-t^-l-\m-l\)yly 



We claim that X-a.s. 



i),('ri),(Vi) i = l 
good 



c 



limsup-log ^ PJ ((4-4-1-1%-! I) VI) 



(£J,(rJ,(l7i) i=l 
good 



R 



(3.140) 



<52-aY,WQ{Ur)mQ^H{^!Q^\u^'''), 



r=l 
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where 62 can be made arbitrarily small by choosing e small and L large. A proof of this is given 
below. Observe next that (j;-i.i:fflf[3lln]) (recah also ([02]) ) yield that X-a.s. (with 5 := 5[ + 82 + 
81 + 62 + ec'p) 

limsnp ^logF{RN e 0{Q)\X) 

N-too ^^ 
R 



<S-Y^ WQ{Ur)(H{Qr I qf^) +{a- l)mQ^H{^Q^ \ u^ 

r=l ' (3.141) 

<S + 2ae--^ [ H{Q' \ qf^) + (a - l)mQ,H{^Q> \ i.'^'')WQ{dQ') 



/-pcrg(£;]V-) 

^ I^''{Q) + 6 + 2ae 



1 + e 

(use (j3.102p for the second inequality, and see (|6.3p for the last equality), which completes the 
proof. 

B. Coarse-graining X ■with R colours. It remains to verify (j3.140p . for which we employ 
a coarse-graining scheme similar to the one used in Section 13.41 (with block lengths [(1 — £2)-^]) 
etc.) To ease notation, we silently replace L by (1 — £2)L in the following. Split X into blocks 
of L consecutive letters, define a {0, l}-valued array Ai^r, « G N, r G {1, . . . , i?} as in Section 13.41 
inductively: For each r, put ^o,r := and, given that ylo,r;^i,r) • • • ; ^z-i,r have been assigned 
values, define Ai as follows: 

(1) If Ai_i^r = 0, then 

1, if in X there is a word from K.{£/r) starting in ((/ — 1)L, IL], 

(3.142) 



Ar ■ = 

0, otherwise. 



(2) If Ai_i^r = 1, then 



A,:-- 



1, if in X there are two words from k(^) starting in {{I — 2)L, (l — l)L], 
respectively, ((/ — 1)L,IL] and occurring disjointly, 

0, otherwise. 

(3.143) 



Put 



Pr := L exp - (1 - 2e2)LH{^i>Q^ | z^^^) . (3.144) 



Arguing as in Section [3741 we can couple the (^i,T-)jGN,i<r<_R with an array uJ = (wj,r)jeN,i<r<R such 
that Ai^r < ^i,r and the sequence {{uJi,i, ■ ■ ■ ,uji^ji)).^ is i.i.d. with F{uJi^r = 1) = Pr- In particular, 
for each r, {uii^r)im is a Bernoulli(pr)-sequence. There may (and certainly will be if ^q^ and *q , 
are similar) an arbitrary dependence between the Wj^i, . . . , coi^R for fixed i, but this will be harmless 
in the low-density limit we are interested in. 

For re{l,...,R}, put 4 := WQ{Ur){l - 6^2), Dr := \{l - e2)MmQjL\. If r/^ G F^^^, then 

\k{i1c)\ GMmQ,^ (1-52, 1 + 62), (3.145) 

so k(?/c) covers at least D^^ consecutive L-blocks of the coarse-graining. Furthermore, as rjc in turn 
contains at least Dr^{l — 3^2) disjoint subsentences from .k/^^, we see that at least D^X^ — 3e2) of 
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these blocks must have Ak^r^ = 1- Thus, for fixed X, we read off from each good triple {ic), i^c), iVc) 
numbers nii < • • • < mc such that 

rric+i >mc + Dr^,c = l,...,C-l, 

\{mc<k<mc + Dr,: ^^^^^ = 1}| > Ajl - Sea), c = 1, . . . , C, (3.146) 

|{l < c< C: rc = r}\ > drC, r = l,...,R. 

where rUc is the index of the L-block that contains ic- Furthermore, note that for a given "coarse- 
graining" {rric) and (re) satisfying (j3.146p . there are at most 

L^(2e2M max mq^ < exp(53iV) (3.147) 

choices for i^ and r/^ that lead to a good triple (ic), (fc), (jjc) with this particular coarse-graining. 
Indeed, for each c = 1, . . . , C there are at most L choices for £c and, since each rj € V^ j^ satisfies 

|K(r?)|GMmQ^^(l-e2,l + e2), (3.148) 

there are at most 2e2MmQ^. choices for ry^ (note that once ic is fixed as a "starting point" for a 
word on X, choosing rj^ in fact amounts to choosing an "endpoint"). Note that 63 can be made 
arbitrarily small by choosing e small and M large. Finally, (I3.147P and Lemma 13.31 yield (I3.140p . 
Indeed, since 

C 1 

limsup— < ^, (3.149) 

R , J- 

< -M^WQiUr)mQ^H{^Q^ I j.^^) + (8e2m*i^i + ^)m, (3.150) 

r=l 

by choosing e small (note that e2m*Ki < e), L and M large, and 7 sufficiently close to 1/a, the 
right-hand side of (|3.154p is smaller than the right-hand side of (j3.140p . 

3.5.6 A multicolour version of the core lemma 

The following is an extension of LemmaEHl Let i? € N, uJj = (wj^i, . . . , (jJi^r) € {0, 1}^, and assume 
that (LJj)jgN is i.i.d. with 

]P{uJi.r = l)=Pr, ie'N, r = l,...,R. (3.151) 

Note that there may be an arbitrary dependence between the Wj^r's for fixed i. This will be harmless 
in the limit we are interested in below. 

Lemma 3.3. Let a G (l,oo), e > 0, (di,...,di?) G [0,1]^ with Y.r=i'^r < 1, -Di, . . . ,-Di? G N, 

C G N, put 

c 

SciuJ) := J2*^,_^c n ("^^ - "^-1 - Dn.,)"", (3.152) 

ri,....rc i^i 

where the sum ^* extends over all pairs of C -tuples tjiq := < mi < • • • < nic from N and 
(ri, . . . , re) G {1, . . . , R}'^ satisfying the constraints 

\{l<i<C: ri = r}\ > drC, r = l,...,R, (3.153) 

\{mi <k<mi + Dr^: u;k,r, = 1}| > -D.r.(l - e), i = l,... ,C. 
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Then uj-a.s. 



limsup — log5c(u;) 



< inf {-(\ogC{a-i) + h{d)+dQ\ogR+(\og2)Y^drDr + {l-e)Y^drDr\ogpr)\, 
7e(i/a,i) 1^7^ ^ ^ ^J 

(3.154) 

where h{d) := — X]j_q c?-r logfir (with do := 1 — di — ■ ■ ■ — d^) is the entropy of d. 

Proof. The proof is a variation on the proof of Lemma [2. 11 We again estimate fractional moments. 
For 7 € (1/a, 1), we have 



msc) 



^Ivi^C 



E 

m-^ , "^(7 



n,^l {|{/C G K,mi + Dr^ - 1] : Uk,r, = 1}\ > (1 - e)L'rj) 

o 



i=l 



(3.155) 
where the sum ^ extends over all (ri, . . . , re) satisfying the constraint in the second line of (j3.153p . 
Noting that 



F(\{ke [mi,mi + Dr^-l]: a;^,^ = 1}| >{l-e)Dr 



Dr- 

E 



m=(l-e)A-, 



D. 



^i \ ^m 



pTi^-Pr) 



Dr —m 



and 



r,- = r, r 



\{{ri,...,rc) G {1,. . . ,R}'^ ■■ at least d^C of the 



l,...,fl}l 



we see from p.l55p that 



R 



E[{Scy] <exp C{dologR + h{d) + o{l))\ x J] (2p(i-^)) 



(l^s)\drCDr 



r=l 



c 



X Yl Him - m,.i - Dr,_,)-''^ 

mi,...,m.c i = l 

= exp C [do log R + /i(d) + log C(a7) + Eli ^r A. log 2 + (1 - e) Erll ^r A^ logPr] , 

(3.156) 
which yields (j3.154p as in the proof of Lemma 12. 1[ D 

3.6 Step 6: Weakening the tail assumption 

We finally show how to go from (|3.3p to (jl.ip . Suppose that p satisfies (|l.ip with a certain 
a S (l,oo). Then, for any a' G (l,a), there is a Cp(a') such that (j3.3p holds for this a'. Hence, 
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as shown in Sections 13.11 - 13.41 for any e > we can find a neighbourhood 0{Q) C j:>^^^M(^]^^'^ of Q 
such that 

lim sup 4 log P(i?jv e 0{Q) I X) < -H{Q \ qf^^) - {a' - 1) tuq H{-^q \ u^"") + ^ X- a.s. 

(3.157) 
The right-hand side is < —I{Q) + e for a' sufficiently close to a, so that we again get (j3.ip . D 

4 Lower bound 

The following lower bound will be used in Section O to derive the lower bound in the definition of 
the LDP. 

Proposition 4.1. For any Q G pinv,fin|-^N-) ^^^ ^^y ^^^^ neighbourhood U{Q) C ■p^'^^[BP) ofQ, 

liminf — logP(i?7V G U(Q) I X) > -I^'\Q) X - a.s. (4.1) 

Proof. Suppose first that Q E •p'=i'g'fi'i(£'N). Then, informally, our strategy runs as follows. In X, 
look for the first string of length ~ Nrnq that looks typical for ^q. Make the first jump long 
enough so as to land at the start of this string. Make the remaining A^ — 1 jumps typical for Q. The 
probability of this strategy on the exponential scale is the conditional specific relative entropy of 
word lengths under Q w.r.t. p^^ given the concatenation, i.e., f» ex.p[N{H^\x{Q) +IEQ[log/9(ri)])], 
times the probability of the first long jump. In order to find a suitable string, we have to skip 
ahead in X a distance ~ ex.]p[NmQH{'^Q \ i/®^)]. By (jl.ip . the probability of the first jump is 
therefore ~ expl—NaniQHi^^Q \ u^ )]. In view of ()1.16p and (11.320 . this yields the claim. In the 
actual proof, it turns out to be technically simpler to employ a slightly different strategy, which has 
the same asymptotic cost, where we look not only for one contiguous piece of "^g-typical" letters 
but for a sequence of \N/M~\ pieces, each of length ~ Mmg. Then we let A^ ^ oo, followed by 
M ^ CO. 

More formally, we choose for 0{Q) an open neighborhood C C O of the type introduced in 
Section [321 and we estimate F{Rn e O' \ X) from below by using dSTTHS^QD- 

Assume first that Q is ergodic. We can then assume that the neighbourhood U is given by 

U = {Q'g P--(^^) : (^L„Q')(Cn) G (a„, b^), n = 1, . . . , [/} (4.2) 

for some [/ G N, Li, . . . , L;/ G N, < a^^ < &„ < 1 and C« € E^" ,u = l,... ,U. As in Section[3Tl by 
ergodicity of Q we can find for each e > a sufficiently large M G N and a set £/ = {zi, ..., za} C 
E^ of "Q-typical sentences" satisfying (l3.6H3.7p (with ei = 6i = e, say), and additionally 

j-\{0 < j < M - Li-. 7rLA0'za) = Cu}\ G(a„,6„), a = 1, . . . ,A, u = 1, . . . ,U. (4.3) 
Let ^ := k{j^). Then from ([SSHSZD we have that, for each beSS, 

\h\ = \{z^.<^: k{z) = b}\ > exp [M{H,\k{Q) - 2e)] , (4.4) 

and 

F{X begins with some element of ^) > exp [ - MmQ{H{^Q \ z^®^) + 2e)] . (4.5) 

Let 
a\ := min{z: O^X begins with some element of I^}, 
a^ := min{i > Oi_i + M{mQ + e) : O'^X begins with some element of =^}, / = 2, 3, . . . 
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(4.6) 



Restricting the sum in ()3.23p over < ji < ■ ■ ■ < jn < cc such that ji = crj , J2—J1, ■ ■ ■ ,JM—JM-i 
are the word lengths corresponding to the ZaS compatible with TrMmgif^'^^X), jm+i = ^"2 ' ^^^-j 
we see that 

[N/M] 

^ log F{Rm eU\X)> H,\k{Q) + Eo[log p(ri)] - 3e - a^ J^ log (a^') - ag^) (4.7) 

1=1 

for A^ sufficiently large. Hence X-a.s. 



hminf 1 logP(i?jv G Z^ I X) > //,|;,(Q) + EQ[logp(ri)] - 3e - ai-E[logaf')] 

Ai— >(X) iV M 

> F,|^(Q) +EQ[logp(ri)] - QmQ(//(^Q | v'^'') - 6e 
= -lfi"(Q) - 6e, 



(4.8) 



where we have used (14. 5p in the second inequality. Now let e J, 0. 

It remains to remove the restriction of ergodicity of Q, analogously to the proof of Birkner [2], 
Proposition 2. To that end, assume that Q € 'P™^'™(ii;^) admits a non-trivial ergodic decomposi- 
tion. Then, for each e > 0, we can find Qi, . . . , Q/j E V'^'^^^'^iE^), Ai, . . . , A^ G (0, 1), 'Zr=i ^r = 1 
such that AiQi + • • • -|- XrQr € U and 

R 

Y,>^rI^"iQr)<I^"{Q)+^ (4.9) 

i=l 

(for details see Birkner [2j, p. 723; employ the fact that both terms in / are affine). For each 
r = 1, . . . , i?, pick a small neighbourhood Ur of Qr such that 

R 

Q'^eUr,r = l,...,R =^ ^XrQ'r-eU. (4.10) 

4 = 1 

Using the above strategy for Qi for AiA^ loops, then the strategy for Q2 for A2A^ loops, etc., we see 
that 



1 

liminf — P(i?jv G Z^ I ^) > - J^ A,.l'^''(Q^) - 6e > -/'^"(Q) - 7e. (4.11) 

D 



j=l 



5 Proof of Theorem 11.21 

Proof. The proof comes in 3 steps. We first prove that, for each word length truncation level tr G N, 
the family F{[RN]tT G • | X), iV G N, X-a.s. satisfies an LDP on 

Vlf^E^) = {Q£ r°^(^^): Q(|y«| < tr) = l} (5.1) 

(recall (jl.llffLTSj) ) with a deterministic rate function -/^'^'^([<5]tr) (this is essentially the content of 
Propositions O and El]). Note that [Q]tr = Q for Q G Vlf''{E^), and that Pj^^(^^) is a closed 
subset of 7^™^(£'^), in particular, a Polish space under the relative topology (which is again the 
weak topology). After we have given the proof for fixed tr, we let tr — > cx) and use a projective 
limit argument to complete the proof of Theorem 11.21 

1. Fix a truncation level tr G N. Propositions 14. l l and 13. II combine to yield the LDP on Vl'^^{E ) 
in the following standard manner. Note that any Q G Vlf^{JE^) satisfies uiq < cx). 
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la. Let O C Vlf^{E^) be open. Then, for any Q £ O, there is an open neighbourhood 0{Q) C 
7'™^(£' ) of Q such that 0{Q) C O. The latter inclusion, together with Proposition 14. H yields 

liminf 4logP([i?jv]tr eO\X)> -I^'\Q) X - a.s. (5.2) 

A'' — 5-00 iV 

Optimising over Q S O, we get 

liminf — logP([i?iv]tr G O I X) > - inf I^'HQ) X - a.s. (5.3) 

Here, note that, since V^'^{E^^) is Polish, it suffices to optimise over a countable set generating the 
weak topology, allowing us to transfer the X-a.s. limit from points to sets (see, e.g.. Comets 0], 
Section III). 

lb. Let /C C P™^(£'^) be compact. Then there exist M € N, Qi, . . . , Qm G ^ and open neighbour- 
hoods 0{Qi), . . .,0{Qm) C V'^^{E^) such that /C C u'^^-^^olQm)- The latter inclusion, together 
with Proposition 13. H yields 

limsup^logP([i?jv]tr G/C |X) < - inf l'^'"(Q^) + e X - a.s. Ve > 0. (5.4) 



Extending the infimum to Q € /C and letting e | afterwards, we obtain 

limsup ^ logF([i?jv]tr e /C I X) < - inf /'^''(Q) X - a.s. (5.5) 

Ic. Let C C ■p™^(£'^) be closed. Because Q i-^ -H^(Q | qf^) has compact level sets, for any M < oo 
the set ICm = C n {Q e Vl^^{E^): H{Q \ qf^^) < M} is compact. Hence, doing annealing on X 
and using (|5.5|) . we get 



limsup^logP([iiiv]tr gC |X) <max|-M,- inf I^^'iQ)] X - a.s. (5.6) 

Extending the infimum to Q € C and letting M ^ oo afterwards, we arrive at 

limsup ^ logP([i?7v]tr gC I X) <- inf /'^"(Q) X - a.s. (5.7) 

N-*oo X QeC 

Equations (|5.3p and (|5.7p complete the proof of the conditional LDP for [i?Ar]tr- 

2. It remains to remove the truncation of word lengths. We know from Step 1 that, for every 
tr G N, the family P([i?7v]tr G • | X), iV G N, satisfies the LDP on V''''' {[E]f^) with rate function 
/ . Consequently, by the Dawson-Gartner projective limit theorem (see Dembo and Zeitouni [5J, 
Theorem 4.6.1), the family F{Rn G • | X), iV G N, satisfies the LDP on pi'^^(E^) with rate function 

/q-(Q) = sup /«'^([Q]u-), Q G P^°^(^^). (5.8) 

treN 

The sup may be replaced by a limsup because the truncation may start at any level. For Q G 
pmvM(^E^^^ we have \imtr^ooI^''{[Q]tT) = I^'^iQ) by Lemma lAT] and so we get the claim if we 
can show that limsup can be replaced by a limit, which is done in Step 3. Note that /'^"'^ inherits 
from I^^ the properties qualifying it to be a rate function: this is part of the projective limit 
theorem. For / these properties are proved in Section [6l 

3. Since I'^^^ is lower semi-continuous, it is equal to its lower semi-continuous regularisation 

?i"^(g) := sup inf /'i"^(Q'), (5-9) 

0(Q)Q'eO(Q) 
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where the supremum runs over the open neighborhoods of Q. For each tr E N, [Q]tv G "pinv.fin^^N-j 
while w — hnitr^oo[Q]tr = Q- So, in particular, 



/q"'=(Q) = /q-(Q) < sup inf /'^"([Q]tr) = liminf /fi°([Q]tr), (5.10) 

n tr>n tr^oo 



implying that in fact 



/q-'=(Q) = lim /^°([Q]tr), Q e r°^(^^). (5.11) 

tr^oo 

D 



Lemma ED in Appendix El together with (iSTTTI) . shows that /'i''°(Q) = /'^'"(Q) for Q G 
pmv,fin(£;N)^ as claimed in the first line of (frT5|) . 

6 Proof of Theorem 11.31 

Proof. The proof comes in 5 steps. 

1. Every Q G 'p^'^^i^E^) can be decomposed as 



Q= Q'WQ{dQ') (6.1) 

for some unique probability measure Wq on 'P'^'^^i^E ) (Georgii [7], Proposition 7.22). If Q G 
pinv,fin(^N)^ then VFq is concentrated on perg.fin^^N^, ^^j^^j g^^ ^^y (fT:yHTTII|) . 

r^Q=/ mQ,t^Q(<iQ'), ^Q=/ . ^^ ^q. W^Q(dg')- (6-2) 

Since Q ^^ H{Q \ qf^) and ^ ^ H{^ \ u^^) are affine (see e.g. Deuschel and Stroock [6], 
Example 4.4.41), it follows from (frT6|) and (f6llC2|) that 

/'"(Q)=/ I^-{Q')WQ{dQ'). (6.3) 

Since Q ^ Wq is affine, ([63]) shows that /^"^ is affine on 77inv,fin(^N)_ 

2. Let (Q„)nGN C pin^.fi'i(^N) ^jg g^pj^ ^i^j^^ w-lim„^oo Qn = Q e pinv,fin(^N)_ gy Proposition EU 



for any e > we can find an open neighbourhood 0{Q) C 'P™^(^E ) of Q such that 

limsup^logP(i?7v eO((5) I ^) < -/^''((3)+e X-a.s. (6.4) 

On the other hand, for n large enough so that Qn G 0{Q), we have from Proposition 14.11 that 



1 



liminf — logP(i?Ar G 0{Q) \ X) > -/^""(Qn) X - a.s. (6.5) 

Combining ()6.4H6.5p . we get that, for any e > 0, 

liminf/fi'^(g„)>/fi°(Q)-e. (6.6) 



Now let e J. 0, to conclude that / is lower semicontinuous on p^'^^'^'^(^E ) (recall also (15. lip ). 
3. From (|1.16p we have 
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Since {Q G V'^^'^iE^): H{Q \ g®^) < C} is compact for all C < oo (see, e.g., Dembo and 
Zeitouni [5j, Corollary 6.5.15), it follows that / has compact level sets on 'P™^'°'^(£'^). 

4. As mentioned at the end of Section [SJ I^^'^ inherits from I^'^ that it is lower semicontinu- 
ous and has compact level sets. In particular, I^"*^ is the lower semicontinuous extension of I^^ 
from pinv,fin(^N) ^^ pmv(^N)_ Moreover, since I^"^ is affine on V'''^'^''{E\&nd /"^^^ arises as the 
truncation hmit of I^"" (recaU ([5T0]) ). it follows that I^"^ is affine on V'''''{E^). 

5. It is immediate from (11.15f[LT6]) that q^^ is the unique zero of J*^™. 

D 

7 Proof of Theorem 11.41 

Proof. The extension is an easy generalisation of the proof given in Sections [SHU 

(a) Assume that p satisfies (II. ip with a = 1. Since the LDP upper bound holds by the annealed 
LDP (compare (jl.Sp and (jl.lSp l. it suffices to prove the LDP lower bound. To achieve this, we first 
show that for any Q G pmv.fin^-^N^ and e > there exists an open neighbourhood 0{Q) C 'P''^''{E^) 
of Q such that 

liminf ^logP(it:7v G 0(Q) I X) > -/^"""(Q) - e X-a.s. (7.1) 

After that, the extension from pinv.finj^^N^ ^^ -pmv^^N-^ follows the argument in Section [5l 

In order to verify (j7.ip . observe that, by our assumption on p(-), for any a' > 1 there exists a 
Ca' > such that 

^>Ca' yne supp(p). (7.2) 

Picking a' so close to 1 that (a' — l)'mQH{'^Q\v^ ) < e/2, we can trace through the proof of 
Proposition 14.11 in Section 2] to construct an open neighbourhood 0{Q) C 7^™^(i?^) of Q satisfying 

liminfllogP(i?^GO(Q)lX) ^^^^^ 

> -H{Q I g®^) - (a' - l)mQHi^Q \ i/^^) - e/2 > -/^"'^(Q) -e X- a.s., 

which is fTT]) . 

(b) We only give a sketch of the argument. Assume a = oo in (jl.ip . For Q ^ p™'^'"°(_E;^)j the 
lower bound (which is non-zero only when Q G ^i,) follows from Birkner [2j, Proposition 2, or 
can alternatively be obtained from the argument in Section HI Now consider a Q G P™^(£'^) with 
niQ = oo, H{Q I g^l!) < cx) and limtr^oom[Q],^i/(^[Q],^ | z.®^) = 0, let 0(Q) C V^^-{E^) be an 
open neighbourhood of Q. For simplicity, we assume supp(p) = N. Fix e > 0. We can find a 
sequence 5]\[ i such that 



max • 



J^- 1 log p{n) : n<lN6N]}<e. (7.4) 

Furthermore, 

for N > Nq = NQ{e, Q), and we can find tro G N such that 
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for tr > tiQ. Hence 



H{[Q]tr\qf^)>H{Q\qf^)-2e for tr > tro- (7.7) 



We may also assume that [Qltr G ^'(Q) for tr > trg. For a given A^ > A^Q; pick tr(A/") > trg so 
large that ™'[Q]t^(jv)-^(^[Q]tr(jv) I '^'^^) ^ ^N 1"^- Using the strategy described at the beginning of 
Section [H we can construct a neighbourhood O^ C 0(<3) of [<3]tr(Af) such that the conditional 
probability ^{Rn £ Opf\X) is bounded below by 

exp [ - N{H{\Q\, I q®^) - e)\ x the cost of the first jump, (7.8) 

where the first jump takes us to a region of size ~ Nra\Q\ on which the medium looks "^fgi ,j^,- 
typical". Since, in a typical medium, the size of the first jump will be 

« exp [A^m[Q]^^^^,/7(^[Q]^^^^, I z.^^)] < exp[iV5^], (7.9) 

we obtain from ([731) and (fTTTHTlQ]) that 

P(i?jv e 0{Q)\X) > exp [ - iV(F(g I g®[^) + 4e)] (7.10) 

for N large enough. 

For the upper bound we can argue as follows: For Q € ^^^(ii;^) put 

r(Ci) := limsupm[Q]^^^^,/?(*[Q]^^^^, | v^'% (7.11) 

tr — >oo 

Since /) satisfies the bound (13. 3p for any a > 1, we obtain from the upper bound in Theorem 11.21 
that the rate function at Q is at least 

limsup/fi'^([Q]t,) = H{Q I qf^) + (a - l)r(Q), (7.12) 

tr^oo 

hence equals oo if r(Q) > 0. On the other hand, if r(ff) = 0, then this is simply the annealed 
bound. D 

8 Proof of Corollary 11.61 

Proof. Let E^ be a Polish space with metric dE (equipped with its Borel-cr-algebra ^e)- We can 
choose a sequence of nested finite partitions £^c = {^c,i) • • • > Ac,nc\i c € N, of £' with the property 
that 

Vx G £; : lim diam((x)c) = 0, (8.1) 

where the coarse- graining map {■)c maps an element of E to the element of £/c it is contained in. 
Each £/c = {E)c is a finite set, which we equip with the discrete metric dc- Extend {■)c to (-E')c' for 
each c' > c via (^c',i')c = ^c,i if ^c',i' C ^c,i- Then the collection ^c, {')c, c G N, forms a projective 
family, and the projective limit 

i^ = {(6, 6, • • • ) : Cc e <, {L')c = Cc, 1 < c < c'} (8.2) 

is again a Polish space with the metric 

dp{iCi,C2,...),{vi,V2,...)) ■.= Y,'^-'M^c,Vc). (8.3) 

c=l 

We equip F with its Borel-c-algebra ^p- We can identify E with a subset of F via i : x i—f {{x)c)^^j^, 
since l is injective by (jS.ip . Note that /.(-E) is a measurable subset of i^ (in general l{E) ^ F; it 
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is easy to see that l{E) is a closed subset of F when E is compact; for non-compact E use the 
one-point compactification of E). 

Note that the topology generated by dp on l{E) is finer than the original topology generated 
by dE '■ By (18. ip . for each x G E and e > 0, there is an e' > such that the dir-ball of radius e' 
around x is contained in the d-ball of radius e around x. We will make use of the fact that 

the trace of ^p on l{E) agrees with the image of ^e under l. (8-4) 

To check this, note that for any x (z E, the function 

I oo, otherwise, 

can be pointwise approximated by functions that are constant on L^Ac^i), i = l,...,nc, and is 
therefore ^jT'-measurable. 

We extend {■)c in the obvious way to E and E , A^ E N U {oo} (via coordinate-wise coarse- 
graining), and then to V{E^), V{E'^), N e N, and finally to V'^'^iE^) and pi°^(^N) (by taking 
image measures). Note that {■)c and [-Jtr commute, and 

mQ = m^Q)^, (x&q), = *^q)^, QGpi-(^^). (8.6) 

By Theorem [L2l for each c G N the family 

¥{{Rn)c G • I ^), NeN, (8.7) 

X-a.s. satisfies the LDP with deterministic rate function 

.aue.^^ f4'°(Q):=^(QI(0^)+(«-lW^(*Ql(^nc), Q G P-^'^°((^)^), ^ ^ 
IT (Q) = { . (8-8) 

l^hmtr-^oo Ic{[Q]tv), if mq = oo. 

Hence, by the Dawson-Gartner projective limit theorem (see Dembo and Zeitouni [5], Theorem 4.6.1), 
the family ¥{Rn G • | X), iV G N, X-a.s. satisfies the LDP on 'P'''^^{F^) with rate function 

/r(Q) = sup/r((Q>c), Q G P'-(F^). (8.9) 

ceN 

The following lemma follows from Deuschel and Stroock [6j, Lemma 4.4.15. 

Lemma 8.1. Let G be a Polish space, let s^c = {^c.i, • • • j^c.nd? c = 1,2,... he a sequence of 
nested finite partitions of G such that limc_^oo diam((x)c) = for all x G G (with a coarse- graining 
map defined as above). Then we have, for ^,i^ G 'P{G), 

h{{fi)c I {u)c) / h{fi \u) as c^ oo. (8.10) 

Let 

pinv^pN^ ._ 1^ ^ pinv^^N) . ^^$(,(^)) = l| ^ (g.n) 

pmv(^N) ._ |g g ^inv(^N) . ^^Q(,(^)) = l| . (8.12) 

Note that (^ allows to view each $ G p^'^^(E^) as an element of Pi'^(F^) and each Q G V'''''{E^) 
as an element of P^'^(F^) via the identification of E and l{E) C F. In particular, we can view u^^ 
as an element of V'£^{F^) and qf^ as an element of P^^(F^). We wiU make use of the fact that, 
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since each real-valued d£;-continuous function on i{E) is automatically d^^-continuous, the weak 
topology on V^^'^{F ) is finer than the weak topology on ^"^^(i? ). 
Fix Q G pinv,fin(^N)_ p^Q^g ^Y^^^ ^j^g functions 

{N, c) ^ ^h {{ttmQ), I (g®^)e) and (L, c) ^ h ((vri^Q)^ | (z.®^)^) (8.13) 

are non-decreasing in both coordinates. Then deduce from (j8.9p and (J1.16p that 

iriQ) = sup {h{{Q), I (g,^^),) + (a - 1) m^g)^ H{^^q)^ \ {u^\)} 
ceN ■- -" 

= sup { sup ^h ((7r^Q)e I (gp,.>f ^) + (a - 1) mg sup i/i ((vtlvI/q), | (z.®^),) 

cgN LiVgN ^v LeN ^ 



sup — sup /i {{ttnQ)c I {qf,u)c) + (a - 1) mg sup - sup /i ((vrL^Q)c | {y®^)c) 



— sup/i((7rArQ)c I {q^^)c) +(a-l)mQSup- 

TVgN ^v ceN LeN ^ ceN 



sup — ^ (ttatQ I g^^) + (a - 1) mg sup -/i (^l*c ' "®^^ 
A^eN ^v LeN ^ 



= H{Q\q^^:) + {a-l)mQH{^Q\v^''), (8.14) 

where we have used Lemma 18.11 in the fourth line. Note that in the third line interchanging the 
suprema and splitting out the supremum over the sum is justified because of ()8.13p . 
For Q G pi>^^(F^) with mg = oo we see from ([82]), ([TTS]) and (fHJi]) that 



lf%Q) = sup/r((Q)c) = sup sup {h{[{Q),],, I {qf;^),) + {a- 1) m[Q],^ ^((^[g]Jc | (z^^^)c)} 
ceN ceNtreN •- -• 

= sup {h{[Q],, 1 qf^) + (a - 1) m[Q],^ F(v&[q],^ | u^"")} . (8.15) 

treN "- -" 

Note that sup^i-eN can be replaced by tr — > oo by arguments analogous to Step 3 in the proof of 
Theorem 11.21 

Finally, we transfer the LDP from P''^^(F^) to V'^'^iE^). To this end, we first verify that the 
rate function is concentrated on V^^^{F ). Put 

F" := {y ^ F : y contains at least one letter from F \ i{E)}. (8.16) 

Then qp,uiF") = 0. For Q G ^^"^(F^) \ V'^'^iF^) we have ttiQ{F") > 0, and hence 

iriQ) > H{Q I O > HmQ I g,,,) = oo. (8.17) 

Thus, by Dembo and Zeitouni [5], Lemma 4.1.5, the family F{Rn G • | X) satisfies for i/® -a.s. all 
X an LDP on 'P^^(F^) with rate N and with rate function given by (jl.l5ffTTT6|) . 

To conclude the proof, observe that we can identify 'P™^(ii)^) and 'P^^(-F^), and that the weak 
topology on 7'™^(£' ), which is 'built' on d-E, is not finer than that which V^^''{F ) inherits from 
7^™^(F ), which is 'built' on dp (recall the discussion following (l8.11fl8T2]l ). Consequently, the 
LDP carries over. D 

A Appendix: Continuity under truncation limits 

The following lemma implies (J1.17p . 
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Lemma A.l. For all Q G 7?mv,bn(^N)^ 



lim Him, I C) = H(Q I c: 

tr— »oo ^' '^' 



i™L"^[Q]tr^(^[Q]t. 



,®N\ 



(g>NN 



mgHi^Q I z.^'-^) 



(A.l) 



Proof. The proof is not quite standard, because Q and [Q]tT, respectively, ^q and ^[q]^^ are not 
"d-close" when tr is large, so that we cannot use the fact that entropy is "d-continuous" (see 
Shields [TO]). 

Lower semi-continuity yields liminftr^oo 1-h.s. > r.h.s. for both limits, so we need only prove 
the reverse inequality. Note that, for all Q G 'pinvM(^E^-^^ 

H{Q)<h{Q^^J<h{^Q{n))+mQlog\E\<c^, if (M/q) < log |i?| < oo, H{Q\qf;^)<^. 

(A.2) 
For Z a random variable, we write .^q{Z) to denote the law of Z under Q. 

A.l Proof of first half of (lATB 

Proof. Since qf^ is a product measure, we have for, any tr G N, 



H{[Q]tr I C) = -H{[Q]tr) - E[Q],. [logp(Ti)] - E[Q],.. 



-H{[Q]tr)-EQ[logp{TiAtT)]-Ec 



i—l 


"ri Atr 


<^n[ 



(A.3) 



By dominated convergence, using that mq < oo and log p{n) < Clog(n + 1) for some C < oo, we 
see that as tr ^ oo the last two terms in the second line converge to 



KQ[logpin)]-KQ 



5^1ogz.(y, 



^(1) 



,i=l 



Thus, it remains to check that 



lim H{[Q]t,) = H{Q). 

tr^oo 



(A.4) 



(A.5) 



Obviously, H{[Q]tr) < H{Q) for ah tr G N (indeed, h{[Q]tr\^ ) < KQ\.^^) for ah A,tr G N, 
because [QJtr is the image measure of Q under the truncation map). For the asymptotic converse, 
we argue as follows. A decomposition of entropy gives 

h{Q\^^) = h{[Q],,\^J + !^h(^^Q[^NY I ^N[Y]tr = z)) {7:N[Q]tr){dz), (A.6) 

where vttv is the projection onto the first N words, and .^q{tinY \ vr^vl^Jtr = z) is the conditional 
distribution of the first N words given their truncations. We have 



N 



h[.^Q{7^NY I 7^N[Y]tr = z)) <Y,h(^Q{Y, \ 7rN[Y]tr 



(A.7) 



i=l 



and 



/ h(^Q{Yi I 7TN[Y]tr = z)) {7:N[Q]tr){dz) 

< [ h(^Q{Y, I [Y,]tr = Z,)) {T^N[Q]tr){dz) (A.^ 

= / h{^Q{Yi I [Y{\,, = y)) {Tii[Q\,,){dy), 1 < i < A, 
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where the inequahty in the second hne conies from the fact that conditioning on less increases 
entropy, and the third hne uses the shift-invariance. Combining (IA.6HA.8l) and letting N -^ oo, we 
obtain 

H{Q) < H{[Q]t,) + / h(^Q{Y^ I [Yi]tr = y)) K[Q]tr)(dy), (A.9) 

and so it remains to check that the second term in the right-hand side vanishes as tr — > oo. 

Note that this term equals (write e for the empty word and w ■ w' for the concatenation of words 
w and w') 



E [^]tr(^) E 



T(ii;)— t 



Q{w • w') 

[Q]triw) 



log 



Q{w ■ w') 

[QUw) 



w'eEU{e} 
Y, Q{w")logQ{w") + Yl QK)log[Q]tr(K: 



tv) 



7-{iu")>tr 



w"eE 

7-{iu")>tr 



But 



0> Yl QK)log[Q]tr(K']tr)> Y QK)logQK), 



w"eE 

r(w")>ti 



w"eE 

r(uj")>tr 



and so the right-hand side of (jA.lOp vanishes as tr ^ oo. 



(A.IO) 



(A.ll) 



D 



A. 2 Proof of second half of (IA.1[) 



Note that limtr^oo "^[QJtr = ^^Q ^^^ ^ ~ limtr^oo ^[Q]tr ~ ^Q ^y dominated convergence, implying 
that 

hmmf if(^[Q],^ I u^"") > H{^Q I i.^^). 

So it remains to check the reverse inequality. Since u^^ is product measure, we have 



(A.12) 



H{^IQ],^ 



,(g.N\ 



-H{^IQ]J 



-E. 



m 



[QV 



Q 



riAtr 



J^log^ 1^, 



^(1) 



1=1 



(A.13) 



By dominated convergence, as tr — >■ oo the second term converges to 

n 



1 



-E, 







5^ log I. (y, 



.1=1 



^(1) 



Thus, it remains to check that 



lim H(^ 



tr-^oo 



[Q]t 



^Q(dx) logi/(x). 



H{^q). 



(A.14) 



(A.15) 



We will first prove (|A.15p for ergodic Q, in which case [Qltr, ^Q-, ^[Q]tr ^^^ ergodic (Birkner [2j, 
Remark 5). 

For ^ e V'^iE^) and e G (0, 1), let 



be the (ri,e) covering number of ^. For any e G (0, 1), we have 



(A.16) 



lim -logA/;(^,e) =H{-^) 



(A.17) 



(see Shields [TU], Theorem 1.7.4). The idea behind (|A.15p is that there are ^ ex.p[nH{^Q)] "^q- 
typical" sequences of length n, and that a "^[ql^. -typical" sequence arises from a "^g-typical" 
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sequence by eliminating a fraction Str of the letters, where Str — *• as tr ^ oo. Hence A/'n(*Q,e) 
cannot be much larger th.anJ\fn{^iQh,e) (on an exponential scale), implying that H{^q)—H{'^iq-\^J 
must be small. 

To make this argument precise, fix e > and pick Nq so large that 

Q(|K(y(^),...,y(^))| GiVmQ[l-e,l + e]) > 1-e for N > Nq. (A.18) 

Pick tro G N so large that for tr > tro and A^ > Nq, 

Q{EZli'^^ - tr)+ <Ne)>l- e/2, Q{n < tr) > 1 - e/2, m[Q],^, > (1 - £)mQ. (A.19) 

For n > [A^o/mg], we will construct a set B C -E" such that 

^q{B X i?-) > i, \B\ < exp [n{H{^[Q]J + 6)] , (A.20) 

where 5 can be made arbitrarily small by choosing e small in (jA.18fR.19p . Hence, by the asymptotic 
cover property ()A.17p . we have H{^q) < (1 + (5)i/(^[Q]^J and 

lunMHi^[Q]J>H{^Q), (A.21) 

completing the proof of (|A.15p . 

We verify (|X20]) as follows. Put A^ := \nmQ{l + 2e)]. By (|AlM09]) and the asymptotic 
cover property ()A.17p for ^[qj^^, there is a set j4 C E^ such that 

EQ[ril^(y«,...,y(^))] >(l-e)mQ (A.22) 

and 

TV 

|^(y«,...,y(^))|>n(l + 6), r(y(i))<tr, j;(r(y«) - tr)+ < A^e, 

V(y«,...,yW)GA 
while the set 

B' := {^{[y^%r, ..., [y(^)]tr)|(o,r(i-.H] : iy'^'\ • • • , 2/^^^) ^ ^} C i^^d-W] (A.24) 

satisfies 

\B'\<exp[n{H{^[Q]J + e)]. (A.25) 

Put 

i?:={/.(y(i),...,yW)|(o,„]:(y«,...,2/(^))GA}c£;-. (A.26) 

Observe that each x' G B' corresponds to at most 

l^r" ( ^ ) < exp [ - n(e log e + (1 - e) log(l - e)) + ne log |^|] (A.27) 

different x G -B, so that 

\B\ < |S'|exp[-n(eloge + (l-e)log(l-e)) + nelog|S|]. (A.28) 
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We have 



ruQ^QiB X E^) > Eg 



Ec 



>E, 



Tl-l 



j;ii,xEoo(e'=K(y))iA(y«,...,yW) 

k=0 

riAtr-1 



. fc=0 
ViAtr-1 



(A.29) 



sraq, 



fc=o 



ewiQ 



miQw'^MJB' X E^ 



so that, finally, 



?7l 



[Q] 



*Q(i? X i?-) > ^^ ^[Q],,(i?' X i?-) - e > i. 



"^Q 



Combinmg (fA^KI) . (1X281) and dXM]) . we obtain (fXMIl with 

5 = -(eloge + (1 - e)log(l - e)) + e(l + log \E\). 



(A.30) 



(A.31) 



Since limsupti._^oo i7(^[Q]j.J < //(^q) by upper semi-continuity of H (see e.g. Georgii [7\, Propo- 
sition. 15.14), this concludes the proof of (JA.ISP for ergodic Q. 

For general Q € 7^™^'™(£'^)j we recall the ergodic decomposition formulas stated in ()6.1H6.2p . 
These yield 



* 



[Q]t 



and 



^(^[Q]..) 



porg,fln(£;N) W.[Q]j^^, 
porg,fln(^N) "T,[Q]^^, 



H{^[Q,]jWQ{dQ'), 



(A.32) 



(A.33) 



because specific entropy is affine. The integrand inside (1A.33P is non-negative and, by the above, 
converges to —^H{^qi) as tr -^ co. Hence, by Fatou's lemma. 



Ihninf H{^[Q^J > [ ^ ^^H{^Q,)WQ{dQ') = Hi^Q), (A.34) 

which concludes the proof. D 
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